An AI generated image, two similar black and white photos side by side, women on a road with palm trees.

Image created with Midjourney, prompted with “Stereogram,” June 2024.

“Definitions belong to the definers, not the defined.”
― Toni Morrison, Beloved

Generative Artificial Intelligence is sometimes described as a remix engine. It is one of the more easily graspable metaphors for understanding these images, but it’s also wrong.

As a digital collage artist working before the rise of artificial intelligence, I was always remixing images. I would do a manual search of the public domain works available through the Internet Archive or Flickr Commons. I would download images into folders named for specific characteristics of various images. An orange would be added to the folder for fruits, but also round, and the color orange; cats could be found in both cats and animals.

I was organizing images solely on visual appearance. It was anticipating their retrieval whenever certain needs might emerge. If I needed something round to balance a particular composition, I could find it in the round folder, surrounded by other round things: fruits and stones and images of the sun, the globes of planets and human eyes.

Once in the folder, the images were shapes, and I could draw from them regardless of what they depicted. It didn’t matter where they came from. They were redefined according to their anticipated use.

A Churning

This was remixing, but I look back on this practice with fresh eyes when I consider the metaphor as it is applied to diffusion models. My transformation of source material was not merely based on their shapes, but their meaning. New juxtapositions emerged, recontextualizing those images. They retained their original form, but engaged in new dialogues through virtual assemblages.

As I explore AI images and the datasets that help produce them, I find myself moving away from the concept of the remix. The remix is a form of picking up a melody and evolving it, and it relies on human expression. It is a relationship, a gesture made in response to another gesture.

To believe we could “automate” remixing assumes too much of the systems that do this work. Remixes require an engagement with the source material. Generative AI systems do not have any relationship with the meanings embedded into the materials they reconfigure. In the absence of engagement, what machines do is better described as a churn, combining two senses of the word. Generative AI models churn images in that they dissolve the surface of these images. Then it churns out new images, that is, “to produce mechanically and in great volume.”

Of course, people can diffuse the surface meaning of images too. As a collagist, I could ignore the context of any image I liked. We can look at the stereogram below and see nothing but the moon. We don’t have to think about the tools used to make that image, or how it was circulated, or who profited from its production. But as a collagist, I could choose to engage with questions that were hidden by the surfaces of things. I could refrain from engagements with images, and their ghosts, that I did not want to disturb.

Keystone Stereoview of the moon at 17 days by William Creswell on Flickr. Public Domain. https://flic.kr/p/mR4N5u

Keystone Stereoview of the moon at 17 days by William Creswell on Flickr. Public Domain. Source.

Actions taken by a person can model actions taken by a machine. But the ability to automate a person’s actions does not suggest the right or the wisdom to automate those actions. I wonder if, in the case of diffusion models, we shouldn’t more closely scrutinize the act of prising meaning from an image and casting it aside. This is something humans do when they are granted, or demand, the power to do so. The automation of that power may be legal. But it also calls for thoughtful restraint.

In this essay, I want to explore the power to inscribe into images. Traditionally, the power to extract images from a place has been granted to those with the means to do so. Over the years, the distribution and circulation of images has been balanced against those who hold little power to resist it. In the automation of image extraction for training generative artificial intelligence, I believe we are embedding this practice into a form of data colonialism. I suggest that power differentials haunt the images that are produced by AI, because it has molded the contents of datasets, and infrastructures, that result in those images.

The Crying Child

Temi Odumosu has written about the “digital reproduction of enslaved and colonized subjects held in cultural heritage collections.” In The Crying Child, Odumosu looks at the role of the digital image as a means of extending the life of a photographic memory. But this process is fraught, and Odumosu dedicates the paper to “revisiting those breaches (in trust) and colonial hauntings that follow photographed Afro-diasporic subjects from moment of capture, through archive, into code” (S290). It does so by focusing on a single image, taken in St. Croix in 1910:

“This photograph suspends in time a Black body, a series of compositional choices, actions, and a sound. It represents a child standing alone in a nondescript setting, barefoot with overpronation, in a dusty linen top too short to be a dress, and crying. Clearly in visible distress, with a running nose and copious tears rolling down its face, the child’s crinkled forehead gives a sense of concentrated energy exerted by all the emotion … Emotions that object to the circumstances of iconographic production.”

The image emerges from the Royal Danish Library. It was taken by Axel Ovesen, a military officer who operated a commercial photography business. The photograph was circulated as a postcard, and appears in a number of personal and commercial photo albums Odumosu found in the archive.

The unnamed crying child appeared to the Danish colonizers of the island as an amusement, and is labeled only as “the grumpy one” (in the sense of “uncooperative”). The contexts in which this image appeared and circulated were all oriented toward soothing and distancing the colonizers from the colonized. By reframing it as a humorous novelty, the power to apply and remove meaning is exercised on behalf of those who purchase the postcard and mail it to others for a laugh. What is literally depicted in these postcards is, Odumosi writes, “the means of production, rights of access, and dissemination” (S295).

I am describing this essay at length because the practice of categorizing this image in an archive is so neatly aligned with the collection and categorization of training data for algorithmic images. Too often, the images used for training are treated solely as data, and training defended as an act that leaves no traces. This is true. The digital copy remains intact.

But the image is degraded, literally, step by step until nothing remains but digital noise. The image is churned, the surface broken apart, and its traces stored as math tucked away in some vector space. It all seems very tidy, technical, and precise, if you treat the image as data. But to say so requires us to agree that the structures and patterns of the crying child in the archive — the shape of the child’s body, the details of the wrinkled skin around the child’s mouth — are somehow distinct from the meaning of the image.

Because by diffusing these images into an AI model, and pairing existing text labels to it within the model, we extend the reach of Danish colonial power over the image. For centuries, archives have organized collections into assemblages shaped and informed by a vision of those with power over those whose power is held back. The colonizing eye sets the crying child into the category of amusements, where it lingers until unearthed and questioned.

If these images are diffused into new images — untraceable images, images that claim to be without context or lineage — how do we uncover the way that this power is wielded and infused into the datasets, the models, and the images ultimately produced by the assemblage? What obligations linger beneath the surfaces of things?

Every Archive a Collage

Collage can be one path for people to access these images and evaluate their historical context. The human collage maker, the remixer, can assess and determine the appropriateness of the image for whatever use they have in mind. This can be an exercise of power, too, and it ought to be handled consciously. It has featured as a tool of Situationist detournement, a means of taking images from advertising and propaganda to reveal their contradictions and agendas. These are direct confrontations, artistic gestures that undermine the organization of the world that images impose on our sense of things. The collage can be used to exert power or challenge the status quo.

Every archive is a collage, a way of asserting that there is a place for things within an emergent or imposed structure. The scholar and artist Beth Coleman’s work points to the reversal of this relationship, citing W.E.B. Du Bois’ exhibition at the 1900 Paris Exposition. M. Murphy writes,

“Du Bois’s use of [photographic] evidence disrupted racial kinds rather than ordered them … Du Bois’s exhibition was crucially not an exhibit of ‘facts’ and ‘data’ that made black people in Georgia knowable to study, but rather a portrait in variation and difference so antagonistic to racist sociology as to dislodge race as a coherent object of study” (71).

The imposed structures of algorithmically generated images rely on facts and data, defined a certain way. They struggle with context and difference. The images these tools produce are constrained to the central tendencies of the data they were trained on, an inherently conformist technology.

To challenge these central tendencies means to engage with the structures it imposes on this data, and to critique this churn of images into data to begin with. Matthew Fuller and Eyal Weizman describe “hyper-aesthetic” images as not merely “part of a symbolic regime of representation, but actual traces and residues of material relations and of mediatic structures assembled to elicit them” (80).

Consider the stereoscope. Once the most popular means of accessing photographs, the stereoscope relied on a trick of the eye, akin to the use of 3D glasses. It combined two visions of the same scene taken from the slight left and slight right of the other. When viewed through a special viewing device, the human eye superimposes them, and the overlap creates the illusion of physical depth in a flat plane. We can find some examples of these on Flickr (including the Danish Film Museum) or at The Library of Congress’ Stereograph collection.

The time period in which this technology was popular happened to overlap with an era of brutal colonization, and the archival artifacts of this era contain traces of how images projected power.

I was struck by stereoscopic images of American imperialism in the Philippines during the US occupation, starting in 1899. They aimed to “bring to life” images of Filipino men dying in fields and other images of war, using the spectacle of the stereoscopic image as a mechanism for propaganda. These were circulated as novelties to Americans on the mainland, a way of asserting a gaze of dominance over those they occupied.

In the long American tradition of infotainment, the stereogram fused a novel technological spectacle with the effort to assert military might, paired with captions describing the US cause as just and noble while severely diminishing the numbers of civilian casualties. In Body Parts of Empire : Visual Abjection, Filipino Images, and the American Archive, Nerissa Balce writes that

“The popularity of war photographs, stereoscope viewers, and illustrated journals can be read as the public’s support for American expansion. It can also be read as the fascination for what were then new imperial ‘technologies of vision’” (52).

The link between stereograms as a style of image and the gaze of colonizing power is now deeply entrenched into the vector spaces of image synthesis systems. Prompt Midjourney for the style of a stereogram, and this history haunts the images it returns. Many prompted images for “Stereograms, 1900” do not even render the expected, highly formulaic structure of a stereogram (two of the same images, side by side, at a slight angle). It does, however, conjure images of those occupied lands. We see a visual echo of the colonizing gaze.

Result of prompting Midjourney with “Stereograms, 1900” in June 2024.

Result of prompting Midjourney with “Stereogram image circa 1900” in June 2024.

Images produced for the more generally used “stereoview,” even without the use of a date, still gravitate to a similar visual language. With “stereoview,” we are given the technical specifics of the medium. The content is more abstract: people are missing, but strongly suggested. These perhaps get me closest to the idea of a “haunted” image: a scene which suggests a history that I cannot directly access.

Perhaps there are two kinds of absences embedded in these systems. The people that colonizers want to erase, and then the evidence of the colonizers themselves. Crucially, this gaze haunts these images.

Here are four sets of two pairs.

Four images produced with the prompt “Stereoview” in Midjourney, June 2024.

These styles are embedded into the prompt for the technology of image capture, the stereogram. The source material is inscribed with the gaze that controlled this apparatus. The method of that inscription — the stereogram — inscribes this material into the present images. The history is loaded into the keyword and its neighboring associations in the vector space. History becomes part of the churn. These are new old images, built from the associations of a single word (stereoview) into its messy surroundings.

It’s important to remember that the images above are not documents of historical places or events. They’re “hallucinations,” that is, they are a sample of images from a spectrum of possible images that exists at the intersection of every image labeled “stereoview.” But “stereoview” as a category does not isolate the technology from how it was used. The technology of the stereogram, or the stereoviewer, was deeply integrated into regimes of war, racial hierarchies, and power. The gaze, and the subject, are both aggregated, diffused, and made to emerge through the churning of the model.

Technologies of Flattening

The stereoview and the diffusion models are both technologies of spectacle, and the affordance of power to those who control it is a similar one. They are technologies for flattening, containing, and re-contextualizing the world into a specific order. As viewers, the generated image is never merely the surfaces of photography churned into new, abstract forms that resemble our prompts. They are an activation of the model’s symbolic regime, which is derived from the corpus of images because it has the power to isolate images from their meaning.

AI has the power of finance, which enables computational resources that make obtaining 5 billion images for a dataset possible, regardless of its impact on local environments. It has the resources to train these images; the resources to recruit underpaid labor to annotate and sort these images. The critiques of AI infrastructure are numerous.

I am most interested here in one form of power that is the most invisible, which is the power of naturalizing and imposing an order of meaning through diffused imagery. The machine controls the way language becomes images. At the same time, it renders historical documentation meaningless — we can generate all kinds of historical footage now.

These images are reminders of the ways data colonialism has become embedded within not merely image generation but the infrastructures of machine learning. The scholar Tiara Roxanne has been investigating the haunting of AI systems long before me. In 2022 Roxanne noted that,

“in data colonialism, forms of technological hauntings are are experienced when Indigenous peoples are marked as ‘other,’ and remain unseen and unacknowledged. In this way, Indigenous peoples, as circumscribed through the fundamental settler-colonial structures built within machine learning systems, are haunted and confronted by this external technological force. Here, technology performs as a colonial ghost, one that continues to harm and violate Indigenous perspectives, voices, and overall identities” (49).

AI can ignore “the traces and residues of material relations” (Fuller and Weizman) as it reduces the image to its surfaces instead of the constellations of power that structured the original material. These images are the product of imbalances of power in the archive, and whatever interests those archives protected are now protected by an impenetrable, uncontestable, automated set of decisions steered by the past.

Image produced with the prompt “Stereoview” in Midjourney, June 2024.

The Abstracted Colonial Subject

What we see in the above images are an inscription by association. The generated image, as a type of machine learning system, matters not only because of how it structures history into the present. It matters because it is a visualization that reaches to something far greater about automated decision making and the power it exerts over others.

These striations of power in the archive or museum, in the census or the polling data, in the medical records or the migration records, determine what we see and what we do not. What we see in generated images must contort itself around what has been excluded from the archives. What is visible is shaped by the invisible. In the real world, this can manifest as families living on a street serving as an indication of those who could not live on that street. It could be that loans granted by an algorithmic assessment always contain an echo of loans that were not approved.

The synthetic image visualizes these traces. They churn the surfaces, not the tangled reality beneath them. The images that emerge are glossy, professional, saturated. Hiding behind these products by and for the attention economy is the world of the not-seen. What are our obligations as viewers to the surfaces we churn when we prompt an image model? How do we reconcile our knowledge of context and history with the algorithmic detachment of these automated remixes?

The media scholar Roland Meyer writes that,

“[s]omewhere in the training data that feeds these models are photographs of real people, real places, and real events that have somehow, if only statistically, found their way into the image we are looking at. Historical reality is fundamentally absent from these images, but it haunts them nonetheless.”

In a seance, you raise spirits you have no right to speak to. The folly of it is the subject of countless warnings in stories, songs and folklore.

What if we took the prompt so seriously? What if typing words to trigger an image was treated as a means of summoning a hidden and unsettled history? Because that is what the prompt does. It agitates the archives. Sometimes, by accident, it surfaces something many would not care to see. Boldly — knowing that I am acting from a place of privilege, and power, I ask the system to return “the abstracted colonial subject of photography.” I know I am conjuring something I should not be.

My words are transmitted into the model within a data center, where they flow through a set of vectors, the in-between state of thousands of photographs. My words are broken apart into key words — “abstracted, colonial, colonial subject, subject, photography.” These are further sliced into numerical tokens to represent the mathematical coordinates of these ideas within the model. From there, these coordinates offer points of cohesion which are applied to find an image within a jpg of digital static. The machine removes the noise toward an image that exists in the overlapping space of these vectors.

Avery Gordon, whose book Ghostly Matters is a rich source of thinking for this research, writes:

“… if there is one thing to be learned from the investigation of ghostly matters, it is that you cannot encounter this kind of disappearance as a grand historical fact, as a mass of data adding up to an event, marking itself in straight empty time, settling the ground for a future cleansed of its spirit” (63).

If history is present in the archives, the images churned from the archive disrupt our access to the flow of history. It prevents us from relating to the image with empathy, because there is no single human behind the image or within it. It’s the abstracted colonial gaze of power applied as a styling tool. It’s a mass of data claiming to be history.

A result from Midjourney for the prompt “abstracted colonial subject of photography” from May 2024.

Human and Mechanical Readings

I hope you will indulge me as my eye wanders through the resulting image.

I am struck by the glossiness of it. Midjourney is fine-tuned toward an aesthetic dataset, leaning into images found visually appealing based on human feedback. I note the presence of palm trees, which brings me to the Caribbean Islands of St. Croix where The Crying Child photograph was taken. I see the presence of barbed wire, a signifier of a colonial presence.

The image is a double exposure. It reminds me of spirit photography, in which so-called psychic photographers would surreptitiously photograph a ghostly puppet before photographing a client. The image of the “ghost” was superimposed on the film to emerge in the resulting photo. These are associations that come to my mind as I glance at this image. I also wonder about what I don’t know how to read: the style of the dress, the patterns it contains, the haircut, the particulars of vegetation.

We can also look at the image as a machine does. Midjourney’s describe feature will tell us what words might create an image we show it. If I use it with the images it produces, it offers a kind of mirror-world insight into the relationship between the words I’ve used to summon that image and the categories of images from which it was drawn.

To be clear, both “readings” offer a loose, intuitive methodology, keeping in the spirit of the seance — a Ouija board of pixel values and text descriptors. They are a way in to the subject matter, offering paths for more rigorous documentation: multiple images for the same prompt, evaluated together to identify patterns and the prevalence of those patterns. That reveals something about the vector space.

Here, I just want to see something, to compare the image as I see it to what the machine “sees.”

The image returned for the abstract colonial subject of photography is described by Midjourney this way:

“There is a man standing in a field of tall grass, inverted colors, tropical style, female image in shadow, portrait of bald, azure and red tones, palms, double exposure effect, afrofuturist, camouflage made of love, in style of kar wai wong, red and teal color scheme, symmetrical realistic, yellow infrared, blurred and dreamy illustration.”

My words produced an image, and then those words disappeared from the image that was produced. “Colonized Subject” is adjacent to the words the machine does see: “tall grass,” “afrofuturism,” “tropical.” Other descriptions recur as I prompt the model over and over again to describe this image, such as “Indian.” I have to imagine that this idea of colonized subjects “haunts” these keywords. The idea of the colonial subject is recognized by the system, but shuffled off to nearest synonyms and euphemisms. Might this be a technical infrastructure through which the images are haunted? Could certain patterns of images be linked through unacknowledged, invisible categories the machine can only indirectly acknowledge?

I can only speculate. That’s the trouble with hauntings. It’s the limit to drawing any conclusions from these observations. But I would draw the reader’s attention to an important distinction between my actions as a collage artist and the images made by Midjourney. The image will be interpreted by many of us, who will find different ways to see it, and a human artist may put those meanings into adjacency through conscious decisions. But to create this image, we rely solely on a tool for automated churning.

We often describe the power of images in terms of what impact an image can have on the world. Less often we discuss the power that impacts the image: the power to structure and give the image form, to pose or arrange photographic subjects.

Every person interprets an image in different ways. A machine makes images for every person from a fixed set of coordinates, its variety constrained by the borders of its data. That concentrates power over images into the unknown coordination of a black box system. How might we intervene and challenge that power?

The Indifferent Archivist

We have no business of conjuring ghosts if we don’t know how to speak to them. As a collage artist, “remixing” in 2016 meant creating new arrangements from old materials, suggesting new interpretations of archival images. I was able to step aside — as a white man in California, I would never use the images of colonized people for something as benign as “expressing myself.” I would know that I could not speak to that history. Best to leave that power to shift meanings and shape new narratives to those who could speak to it. Nonetheless, it is a power that can be wielded by those who have no rights to it.

Yes, by moving any accessible image from the online archive and transmuting it into training data, diffusion models assert this same power. But it is incapable of historic acknowledgement or obligation. The narratives of the source materials are blocked from view, in service to a technically embedded narrative that images are merely their surfaces and that surfaces are malleable. At its heart is the idea that the context of these images can be stripped and reduced into a molding clay, for anyone’s hands to shape to their own liking.

What matters is the power to determine the relationships our images have with the systems that include or exclude. It’s about the power to choose what becomes documented, and on what terms. Through directed attention, we may be able to work through the meanings of these gaps and traces. It is a useful antidote to the inattention of automated generalizations. To greet the ghosts in these archives presents an opportunity to intervene on behalf of complexity, nuance, and care.

That is literal meaning of curation, at its Latin root: “curare,” to care. In this light, there is no such thing as automated curation.

Reclaiming Traceability

In 2021, Magda Tyzlik-Carver wrote “the practice of curating data is also an epistemological practice that needs interventions to consider futures, but also account for the past. This can be done by asking where data comes from. The task in curating data is to reclaim their traceability and to account for their lineage.”

When I started the “Ghost Stays in the Picture” research project, I intended to make linkages between the images produced by these systems and the categories within their training data. It would be a means of surfacing the power embedded into the source of this algorithmic churning within the vector space. I had hoped to highlight and respond to these algorithmic imaginaries by revealing the technical apparatus beneath the surface of generated images.

In 2024, no mainstream image generation tool offers the access necessary for us to gather any insights into its curatorial patterns. The image dataset I initially worked with for this project is gone. Images of power and domination were the reason — specifically, the Stanford Internet Observatory’s discovery of more than 3,000 images in the LAION 5B dataset depicting abused children. Realizing this, the churn of images became visceral, in the pit of my stomach. The traces of those images, the pain of any person in the dataset, lingers in the models. Perhaps imperceptibly, they shape the structures and patterns of the images I see.

In gathering these images, there was no right to refuse, no intervention of care. Ghosts, Odumosu writes, “make their presences felt, precisely in those moments when the organizing structure has ruptured a caretaking contract; when the crime has not been sufficiently named or borne witness to; when someone is not paying attention” (S299).

The training of Generative Artificial Intelligence systems has relied upon the power to automate indifference. And if synthetic images are structured in this way, it is merely a visualization of how “artificial intelligence systems” structure the material world when carelessly deployed in other contexts. The synthetic image offers us a glimpse of what that world would look like, if only we would look critically at the structures that inform its spectacle. If we can read algorithmic decision-making a lapse in care, a disintegration of accountability, we might see fresh pavement has been poured onto sacred land.

This regime of Artificial Intelligence is not an inevitability. It is not even a single ideology. It is a computer system, and computer systems, and norms of interaction and participation with those systems, are malleable. Even with training datasets locked away behind corporate walls, it might still be possible “to insist on care where there has historically been none” (Odumosu S297), and by extension, to identify and refuse the automated inscription of the colonizing ghost.

This post concludes my research work at the Flickr Foundation, but I am eager to continue it. I am seeking publishers of art books, or curators for art or photographic exhibitions, who may be interested in a longer set of essays or a curatorial project that explores this methodology for reading AI generated images. If you’re interested, please reach out to me directly: eryk.salvaggio@gmail.com.

Two black and white photographs of shadows mingling with painted road crossings. On the left is a human photograph taken from Flickr, on the right, an image generated with Stable Diffusion 1.6. (Flickr image, “Ombre” by Renaud LEON, CC-BY-NC-SA).

“Today the photograph has transformed again.” – David A. Shamma, in a blog post announcing the YFCC100M dataset.

In part one of this series, I wrote about the differences between archives, datasets, and infrastructures. We explored the movement of images into archives through the simple act of sharing a photograph in an online showcase. We looked at the transmutation of archives into datasets — the ways those archives, composed of individual images, become a category unto themselves, and analyzed as an object of much larger scale. Once an archive becomes a dataset, seeing its contents as individual pieces, each with its own story and value, requires a special commitment to archival practices.

Flickr is an archive — a living and historical record of images taken by people living in the 21st century, a repository for visual culture and cultural heritage. It is also a dataset: the vast sum of this data, framed as an overwhelming challenge for organizing, sorting, and contextualizing what it contains. That data becomes AI infrastructure, as datasets made to aid the understanding of the archive become used in unexpected and unanticipated ways.

In this post, I shift my analysis from image to archive to dataset, and trace the path of images as they become AI infrastructure — particularly in the field of data-driven machine learning and computer vision. I’ll again turn to the Flickr archive and datasets derived from it.

99.2 Million Rows

A key case study is a collection of millions of images shared in June 2014. That’s when Yahoo! Labs released the YFCC100M dataset, which contained 99.2 million rows of metadata describing photos by 578,268 Flickr members, all uploaded to Flickr between 2004 and 2014 and tagged with a CC license. The dataset contained information such as photo IDs, URLs, and a handful of metadata such as the title, tags, description. I believe that the YFCC100M release was emblematic of a shift in the public’s — and Silicon Valley’s — perception of the visual archive into the category of “image datasets.”

Certainly, it wasn’t the first image dataset. Digital images had been collected into digital databases for decades, usually for the task of training image recognition systems, whether for handwriting, faces, or object detection. Many of these assembled similar images, such as Stanford’s dogs dataset or NVIDIA’s collection of faces. Nor was it the first transition that a curated archive made into the language of “datasets.” For example, the Tate Modern introduced a dataset of 70,000 digitized artworks in 2013.

What made YFCC100M interesting was that it was so big, but also diverse. That is, it wasn’t a pre-assembled dataset of specific categories, it was an assortment of styles, subject matter, and formats. Flickr was not a cultural heritage institution but a social media network with a user base that had uploaded far more images than the world’s largest libraries, archives, or museums. In terms of pure photography, no institution could compete on scale and community engagement.

The YFCC100M includes the description, tags, geotags, camera types, and links to 100 million source images. As a result, we see YFCC100M appear over and over again in papers about image recognition, and then image synthesis. It has been used to train, test, or calibrate countless machine vision projects, including high-rated image labeling systems at Google and OpenAI’s CLIP, which was essential to building DALL-E. Its influence in these systems rivals that of ImageNet, a dataset of 14 million images which was used as a benchmark for image recognition systems, though Nicolas Maleve notes that nearly half of ImageNet’s photos came from Flickr URLs. (ImageNet has been explored in-depth by Kate Crawford and Trevor Paglen.)

10,000 Images of San Francisco

It is always interesting to go in and look at the contents of a dataset, and I’m often surprised how rarely people do this. Whenever we dive into the actual content of datasets we discover interesting things. The YFCC100M dataset contains references to 200,000 images by photographer Andy Nystrom alone, a prolific street photographer who has posted nearly 8 million images to Flickr since creating their account in 2008.

The dataset contains more than 10,000 images each of London, Paris, Tokyo, New York, San Francisco, and Hong Kong, which outnumber those of other cities. Note the gaps here: all cities of the Northern hemisphere. When I ask Midjourney for an image of a city, I see traces of these locations in the output.

Generated by Eryk Salvaggio with Midjourney 6.

Images of cities generated with Midjourney 6. The concept of a city is built from thousands of images of cities — how might Flickr’s datasets have influenced which cities are manifested more visibly in these renderings? Or are Flickr’s datasets simply a way to understand the distribution of that data more generally?

Are these strange hybrids a result of the prevalence of Flickr in the calibration and testing of these systems? Are they a bias accumulated through the longevity of these datasets and their embeddedness into AI infrastructures? I’m not confident enough to say for sure. But missing from the images produced from the generic prompt “city” are traces of what Midjourney considers an African city. What emerges are not shiny, glistening postcard shots or images that would be plastered on posters by the tourist bureau. Instead, they seem to affirm the worst of the colonizing imagination: unpaved roads, cars broken down in the street. The images for “city” are full of windows reflecting streaks of sunlight; for “African city,” these are windows absent of glass.

“A prompt about a ‘building in Dakar’ will likely return a deserted field with a dilapidated building while Dakar is a vibrant city with a rich architectural history,” notes the Senegalese curator Linda Dounia. She adds: “For a technology that was developed in our times, it feels like A.I. has missed an opportunity to learn from the fraught legacies that older industries are struggling to untangle themselves from.”

Images produced with Midjourney in June 2024 for "Photographs of an African City."

Images produced with Midjourney in June 2024 with the prompt, “Photograph of an African city.”

Beyond the training data, these legacies are also entangled in digital infrastructures. We know images from Flickr have come to shape the way computers represent the world, and how we define tests of AI-generated output as “realistic.” These definitions emerge from data, but also from infrastructures of AI. Here, one might ask if the process of calibrating images to places has been so centered on the geographic regions where Flickr has access to ample images: 10,000 images each from cities of the Northern Hemisphere. These created categories for future assessment and comparison.

A 1 million photo sample of the 48 million geotagged photos from the dataset plotted around the globe. Creative Commons License One Million Creative Commons Geo-tagged Photos by aymanshamma on Flickr.

An infographic from the original blog post about the YFCC100M dataset. Denser green areas show a greater density of the 48 million geotagged photos from the dataset plotted around the globe. This infographic was made from a 1 million image sample. Creative Commons License One Million CC-BY-NC: Geo-tagged Photos by aymanshamma on Flickr.

What we see in those images of an “African city” are what we don’t see in the data set. What we see is what is what is missing from that infrastructure: 10,000 pictures of Lagos or Nairobi. When these images are absent from the training data, they influence the result. When they are absent from the classifiers and calibration tools, that absence is entrenched.

The sociologist Avery Gordon writes of ghosts, too. For Gordon, the ghost, or the haunting, is “the intermingling of fact, fiction and desire as it shapes the personal and social memory … what does the ghost say as it speaks, barely, in the interstices of the visible and invisible?” In these images, the ghost is the image not taken, the history not preserved, the gaps that haunt the archives. It’s clear these absences move into the data, too, and that the images of artificial intelligence are haunted by them, conjuring up images that reveal these gaps, if we can attune ourselves to see them.

There is a limit to this kind of visual infrastructural analysis of image generation tools — its reliance on intuition. There is always a distance between these representations of reality in the generated image and the reality represented in the datasets. Hence our language of the seance. It is a way of poking through the uncanny, to see if we can find its source, however remote the possibility may be.

Representativeness

We do know a few things, in fact. We know this dataset was tested for representativeness, that was defined as how evenly it aligned with Flickr’s overall content — not the world at large. We know, then, that the dataset was meant to represent the broader content of Flickr as a whole, and that the biases of the dataset — such as the strong presence of these particular cities — are therefore the biases of Flickr. In 2024, an era where images have been scraped from the web wholesale for training data without warning or permission, we can ask if the YFCC100M dataset reflected the biases we see in tools like DALL-E and Midjourney. We can also ask if the dataset, in becoming a tool for measuring and calibrating these systems, may have shaped those biases as a piece of data infrastructure.

As biased data becomes a piece of automated infrastructure, we see biases come into play from factors beyond just the weights of the training data. It also comes into play in the ways the system maps words to images, sorts out and rejects useful images, and more. One of the ways YFCC100M’s influence may shape these outcomes is through its role in training the OpenAI tool I mentioned earlier, called CLIP.

CLIP looks at patterns of pixels in an image and compares them to labels for similar sets of pixels. It’s a bridge that connects the descriptions of images to words of a user’s prompt. CLIP is a core connection point between words and images within generative AI. Recognizing whether an image resembles a set of words is how researchers decided what images to include in training datasets such as LAION 5B.

Calibration

CLIP’s training and calibration dataset contained a subset of YFCC100M, about 15 million images out of CLIP’s 400 million total. But CLIP was calibrated with, and its results tested against, classifications using YFCC100M’s full set. By training and calibrating CLIP against YFCC100M, that dataset played a role in establishing the “ground truth” that shaped CLIP’s ability to link images to text.

CLIP was assessed on its ability to scale the classifications produced by YFCC100M and MS-COCO, another dataset which consisted entirely of images downloaded from Flickr. The result is that the logic of Flickr users and tagging has become deeply embedded into the fabric of image synthesis. The captions created by Flickr members modeled — and then shaped — the ways images of all kinds would be labeled in the future. In turn, that structured the ways machines determined the accuracy of those labels. If we want to look at the infrastructural influences of these digital “ghosts in the machine,” then the age, ubiquity, and openness of the YFCC100M dataset suggests it has a subtle but important role to play in the way images are produced by diffusion models.

We might ask about “dataset bias,” a form of bias that doesn’t refer to the dataset, or the archive, or the images they contain. Instead, it’s a bias introduced through the simple act of calling something a dataset, rather than acknowledging its constitutive pieces. This shift in focus shifts our relationship to these pieces, asking us to look at the whole. Might the idea of a “dataset” bias us from the outset toward ignoring context, and distract us from our obligation of care to the material it contains?

From Drips Comes the Deluge

The YFCC100M dataset was paired with a paper, YFCC100M: The New Data in Multimedia Research, which focused on the needs of managing visual archives at scale. YFCC100M was structured as an index of the archive: a tool for generating insight about what the website held. The authors hoped it might be used to create tools for handling an exponential tide of visual information, rather than developing tools that contributed to the onslaught.

The words “generative AI” never appear in the paper. It would have been difficult, in 2014, to anticipate that such datasets would be seen through a fundamental shift from “index” to “content” for image generation tools. That is a shift driven by the mindset of AI companies that rose to prominence years later.

In looking at the YFCC100M dataset and paper, I was struck by the difference between the problems it was established to address and the eventual, mainstream use of the dataset. Yahoo! released the paper in response to the problems of proprietary datasets, which they claimed were hampering replication across research efforts. The limits on the reuse of datasets also meant that researchers had to gather their own training data, which was a time consuming and expensive process. This is what made the data valuable enough to protect in the first place — an interesting historical counterpoint to today’s paradoxical claim by AI companies that image data is both rare and ubiquitous, essential but worth very little.

Attribution

Creative Commons licensed pictures were selected for inclusion in order to facilitate the widest possible range of uses, noting that they were providing “a public dataset with clearly marked licenses that do not overly impose restrictions on how the data is used” (2). Only a third of the images in the dataset were marked as appropriate for commercial use, and 17% required only attribution. But, in accordance with the terms of the Creative Commons licenses used, every image in the dataset required attribution of some kind. When the dataset was shared with the public, it was assumed that researchers would use the dataset to determine how to use the images contained within it, picking images that complied with their own experiments.

The authors of the paper acknowledge that archives are growing beyond our ability to parse them as archivists. But they also acknowledge Flickr as an archive, that is, a site of memory:

“Beyond archived collections, the photostreams of individuals represent many facets of recorded visual information, from remembering moments and storytelling to social communication and self-identity [19]. This presents a grand challenge of sensemaking and understanding digital archives from non-homogeneous sources. Photographers and curators alike have contributed to the larger collection of Creative Commons images, yet little is known on how such archives will be navigated and retrieved, or how new information can be discovered therein.”

Despite this, there was a curious contradiction in the way Yahoo! Labs structured the release of the dataset. The least restrictive license in the dataset is CC-BY — images where the license requires attribution. Nearly 68 million out of the 100 million images in the dataset specifically stated there could be no commercial use of their images. Yet, the dataset itself was then released without any restrictions at all, described as “publicly and freely usable.”

The dataset of YFCC100M wasn’t the images themselves. It was the list of images, a sample of the larger archive that was made referenceable as a way to encourage researchers to make sense of the scale of image hosting platforms. The strange disconnect between boldly declaring the contents as CC-licensed, while making them available to researchers to violate those licenses, is perhaps evident only in hindsight.

Publicly Available

It may not have been a deliberate violation of boundaries so much as it was a failure to grapple with the ways boundaries might be transgressed. The paper, then, serves as a unique time capsule for understanding the logic of datasets as descriptions of things, to the understanding of datasets as the collection of things themselves. This was a logic that we can see carried out in the relationships that AI companies have to the data they use. These companies see the datasets as markedly different from the images that the data refers to, suggesting that they have the right to use datasets of images under “fair use” rules that apply to data, but not to intellectual property.

This breaks with the early days of datafication and machine learning, which made clearer distinctions between the description of an archive and the archive itself. When Stability AI used LAION 5B as a set of pointers to consumable content, this relationship between description and content collapsed. What was a list of image URLs and the text describing what would be found there became pointers to training data. The context was never considered.

That collapse is the result of a set of a fairly recent set of beliefs about the world which increasingly sees the “image” as an assemblage of color information paired with technical metadata. We hear echoes of this in the defense of AI companies, that their training data is “publicly available,” a term with no actual, specific meaning. OpenAI says that CLIP was trained on “text–image pairs that are already publicly available” in its white paper.

In releasing the dataset, Yahoo’s researchers may have contributed to a shift: from understanding online platforms through the lens of archives, into understanding them as data sources to be plundered. Luckily, it’s not too late to reassert this distinction. Visual culture, memory, and history can be preserved through a return to the original mission of data science and machine learning in the digital humanities. We need to make sense of a growing number of images, which means preserving and encouraging new contexts and relationships between images rather than replacing them with context-free abstractions produced by diffusion models.

Generative AI is a product of datasets and machine learning and digital humanities research. But in the past ten years, data about images and the images themselves have become increasingly interchangeable. Datasets were built to preserve and study metadata about images. But now, the metadata is stripped away, aside from the URL, which is used to analyze an image. The image is translated into abstracted information, ignoring where these images came from and the meaning – and relationships of power – that are embedded into what they depict. In erasing these sources, we lose insight into what they mean and how they should be understood: whether an image of a city was taken by a tourism board or an aid agency, for example. The biases that result from these absences are made clear.

Correcting these biases requires care and attention. It requires pathways for intervention and critical thinking about where images are sourced. It means prioritizing context over convenience. Without attention to context, correcting the source biases are far more challenging.

Data Casts Shadows

In my fellowship with the Flickr Foundation, I am continuing my practice with AI, looking at the gaps between archives and data, and data and infrastructures, through the lens of an archivist. It is a creative research approach that examines how translations of translations shape the world. I am deliberately relying on the language of intuition — ghosts, hauntings, the ritual of the seance — to encourage a more human-scaled, intuitive relationship to this information. It’s a rebuttal of the idea that history, documentation, images and media can be reduced to objective data.

That means examining the emerging infrastructure built on top of data, and returning to the archival view to see what was erased and what remains. What are the images in this dataset? What do they show us, and what do they mean? Maleve writes that to become AI infrastructure, a Flickr image is pulled from the context of its original circulation, losing currency. It is relabeled by machines, and even the associations of metadata itself become superfluous to the goal of image alignment. All that matters is what the machine sees and how it compares to similar images. The result is a calibration: the creation of a category. The original image is discarded, but the residue of whatever was learned lingers in the system.

While writing this piece, I became transfixed by shadows within synthetic images. Where does the shadow cast in an AI generated image come from? They don’t come from the sun, because there is no sunlight within the black box of the AI system. Despite the hype, these models do not understand the physics of light, but merely produce traces of light abstracted from other sources.

Unlike photographic evidence, synthetic photographs don’t rely on being present to the world of light bouncing from objects onto film or sensors. The shadows we see in an AI generated image are the shadows cast by other images. The generated image is itself a shadow of shadows, a distortion of a distortion of light. The world depicted in the synthetic image is always limited to the worlds pre-arranged by the eyes of countless photographers. Those arrangements are further extended and mediated as these data shadows stretch across datasets, calibration systems, engineering decisions, design choices and automated processes that ignore or obscure their presence.

Working Backward from the Ghost

When we don’t know the source of decisions made about the system, the result is unexplainable, mysterious, spooky. But image generation platforms are a series of systems stacked on top of one another, trained on hastily assembled stews of image data. The outcomes go through multiple steps of analysis and calibration, outputs of one machine fed into another. Most of these systems are built upon a subset of human decisions scaled to cover inhuman amounts of information. Once automated, these decisions become disembodied, influencing the results.

In part 3 – the conclusion of this series – I’ll examine a means of reading AI generated images through the lens of power, hoping to reveal the intricate entanglement of context, control, and shifting meanings within text and image pairs. Just as shadows move across the AI generated image, so too, I propose, does the gaze of power contained within the archives.

I’ll attempt to trace the flow of power and meaning through datasets and data infrastructures that produce these prompted images, working backwards from what is produced. Where do these training images come from? What stories and images do they contain, or lack? In some ways, it is impossible to parse, like a ghost whose message from the past is buried in cryptic riddles. A seance is rarely satisfying, and shadows disappear under a flashlight.

But it’s my hope that learning to read and uncover these relationships improves our literacy about so-called AI images, and how we relate to them beyond toys for computer art. Rather, I hope to show that these are systems that perpetuate power, through inclusion and exclusion, and the sorting logic of automated computation. The more we automate a system, the more the system is haunted by unseen decisions. I hope to excavate the context of decisions embedded within the system and examine the ways that power moves through it. Otherwise, the future of AI will be dictated by what can most easily be forgotten.

Read part three here.

***

I would be remiss not to point to the excellent and abundant work on Flickr as a dataset that has been published by Katrina Sluis and Nicolas Malevé, whose work is cited here but merits a special thank you in shaping the thinking throughout this research project. I am also grateful to scholars such as Timnit Gebru, whose work on dataset auditing has deeply informed this work, and to Dr. Abeba Birhane, whose work on the content of the LAION 5B dataset has inspired this creative research practice.

In the images accompanying this text, I’ve paired images created in Stable Diffusion 1.6 for the prompt “Flickr.com street shadows.” They’re paired with images from actual Flickr members. I did not train AI on these photos, nor did I reference the originals in my prompts. But by pairing the two, we can see the ways that the original Flickr photos might have formed the hazy structures of those generated by Stable Diffusion.

Two black and white photographs of shadows mingling with sidewalks. On the left is a human photograph taken from Flickr, on the right, an image generated with Stable Diffusion 1.6. (Flickr image, “Life Goes On” by sinkdd, CC BY-NC-ND 2.0).

“The Absence Becomes the Thing.”
– Rindon Johnson, from The Law of Large Numbers

Every image generated by AI calls up a line of ghosts. They haunt the training data, where the contexts of photographs are reduced to the simplest of descriptions. They linger in the decisions of engineers and designers in what labels to use. The ghosts that haunt the generated image are hidden by design, but we can find them through their traces. We just need to know how to look.

As an artist, the images created by AI systems are rarely interesting to me solely as photographs. I find the absences that structure these images, and the stories told in the gaps, to be far more compelling. The images themselves recycle the tropes of their training data. By design, they lean into the most common patterns, changing the details like a lazy student changing the words of a plagiarized essay.

I don’t turn to generative AI for beautiful images. I look for evidence of ghosts.

What exactly is a ghost in an AI system? It’s a structure or decision that haunts information in barely discernible, even invisible, ways. Datasets are shaped by absences, and those absences shape the image. As a diffusion model seeks the path to an image, the absence of pathways constrains what is possible. We can read these paths by looking at AI images critically, addressing the negative space of what appears on our screens. Who are the people we don’t see? What are the stories these images cannot tell?

This can mean absences in representation. When we have thousands of photographs of white children tagged as “girls,” but few black children, black girls are absent from the images. Absence haunts the generated image, shaping it: we will see mostly white girls because black girls have been pushed to the edges. This is not just a glib example. The exact scenario is precisely what I found when I analyzed a dataset used for training image generation tools and automated surveillance systems in 2019. The pattern holds today. Victorian-era portraits of white girls are prevalent in the training data for generative AI systems such as Stable Diffusion. Black girls are absent, with highly sexualized images of adult women taking their place.

Infrastructure makes ghosts, too. We build complex systems one step at a time, like a set of intersecting hallways. Artificial Intelligence is, at its heart, a means of automating decisions. They carry decisions from the past into the future. Once we inscribe these decisions into code, the code becomes infrastructure, subsumed into a labyrinth made by assembling the code of others and code yet to be written. As we renovate these structures through new code or system upgrades, the logic of a particular path is lost. We may need to build new walls around it. But when we bury code, we bury decisions beneath a million lines of if/then statements, weights, and biases of machine learning. Unchallenged, the world that has slipped past us shapes the predictions of these systems in ways we cannot understand.

This is true of most data driven, automated systems, whether we are talking about resume filters or parole decisions. For the generated photograph, these decisions include how we test and calibrate image recognition systems, and how we iterate upon these systems with every new model launch and interface.

Diffusion models — at the core of image generation systems — are an entanglement of systems. It relies on one system to label images, examining how pixels are clustered and matching them with human descriptions. We relied on underpaid labor by humans to test these systems by comparing the results of that tool to what they saw themselves. These comparisons are recorded and integrated into the memory of the model. The actions of those people were fused into the infrastructure of the model, shaping decisions long after they stopped working on the dataset.

We tend to make up stories about synthetic images based on what’s depicted. That is the work of the human imagination: a way of making sense of media based on references we’ve seen before. That is a ghost story, too. But if we want to meet the ghosts that shape AI-generated images, we have to dig deeper into the systems that produce them. The AI ghost story is a story of the past reaching into the present, and to understand it, it helps to know the lineage of those decisions.

Image synthesis has a history, and that history lingers in black boxes of the neural nets as they shape noisy pixels into something recognizable. Part of that story is the datasets, but data is applied to a vast number of uses. One of those uses is training larger systems to sort and handle larger sums of data.

Data shapes data infrastructure. From small sets of data, patterns are found and applied to larger sums of data. These patterns are repeatedly invoked whenever we call upon these systems into the future. The source data is always an incomplete model of things. But nonetheless, it is applied to larger datasets, which inherit and amplify the gaps, absences, and decisions from the past.

This is part of my creative research work on the seance of the digital archive. It focuses not only on data, but the lineage of data and the decisions made using that data to shape larger systems. A key piece of this lineage, and one that merits deeper exploration, is Flickr.

The Archive and the Dataset

With the rise of generative AI, vast troves of visual, textual, and sonic cultural heritage data have been folded into models that create new images, text, even music. But images are host to a special kind of spectral illusion. Most images shared online were never intended to become “data,” and in many ways, this transformation into data is at odds with the real value at the heart of what these archives preserve.

What is the difference between an archive and a dataset? We are dealing with many levels of abstraction here: an archive consists of individual objects designed to serve some human purpose. These objects may then be curated into a collection. It may be a collection of pamphlets, political cartoons, or documentary photographs. It may be the amateur photographer aiming to preserve a snapshot of a birthday party whose daughter and granddaughter celebrated alongside one another. Flickr, as a photo-sharing website, is host to all of these. The miracle of data, compression, and the world wide web is that the same infrastructures can be shared for moments important to “history” but also to the individual. It preserves images from cultural heritage institutions and family beach outings alike.

Flickr is three things at once: an archive and a dataset, most famously. But it is also a kind of data infrastructure. Let’s explore these one by one.

Flickr is an archive. It is a website that preserves history. It holds digital copies of historical artifacts for individual reflection and context. Flickr is a website for memories, stored in its copies of images, snapshots, aids to the remembrance of personal stories. These are assembled into an archive, a collective photo album. Flickr as an archive is a place where the context of an individual item is preserved. But we make sense of this archive socially. Meanings change as users sort these images, tag them, and reuse them (with permission) across the web. The archive is a collection of images with their own history beyond the website itself.

Flickr is a dataset. Flickr images can be described, at scale, in pure numbers. In 2011, the web site claimed to have 6 billion images and more recently boasted of having “tens of billions” of photos, with estimates of 25 million uploads per day. By contrast, the largest widely used image dataset used in machine learning, LAION 5B, contains roughly 5.85 billion images. Flickr as a massive, expanding dataset poses a particular set of challenges in thinking about its future. One of these is the daunting task of sorting and understanding all of those images. The dataset, then, is really just the archive viewed through the abstraction of scale. Billions of images now seen as one data set, with each image merely a piece of the collective whole. As a dataset, we focus on the ways the entirety of that set can be preserved and understood.

But in shifting our lens of focus from an archive to a dataset, individual details become less important. In changing scales in this way, it’s important to move fluidly between them — much as we close one eye, then the other, as we look at the letters of the eye exam. If we want to tackle the myopia of design decisions, we must get used to shifting between these two views, rather than treating one as the sole way we see the world.

What does it mean for Flickr to be “infrastructure” for AI? It helps to define this slippery term, so I turn to a definition used by the Initiative for Public Digital Infrastructure at UMass Amherst:

“Infrastructures are fundamental systems that allow us to build other systems—new houses and businesses rely on the infrastructures of electric power lines, water mains, and roads—and infrastructures are often invisible so long as they work well.”

In the relationship to images in particular, Katrina Sluis describes the shift in meaning attributed to images as their context shifts from archive to data infrastructures:

“Photographic culture is now being sustained by a variety of agents who sit outside the traditional scope of cultural institutions. I’m thinking here of the computer scientist, web designer, Silicon Valley entrepeneur or Amazon Mechanical Turker. And of course, these are actors who are not engaged with photographic culture and the politics of representation, the history of photography or the inherent polysemy of the image. In the computer science lab, the photograph remains relatively uncomplicated – it is ultimately a blob of information – whether materialized as a “picture” or left latent as data.”

Flickr’s images played an important role in shaping image recognition systems at the outset, and in turn, image generation systems. As a result of this entrenchment of images into AI, many Flickr images have become a form of “accidental infrastructure” for AI. I should be clear that Flickr has not trained a generative AI model of its own for the production of new images, nor has it arranged, as of this writing, for the sale of images for use in image training.

When we examine Flickr as infrastructure, we will see that these two worlds — archive and dataset — have come to occupy the same space, complicating our understanding of them both. Flickr’s movement from archive to dataset in the eyes of AI training isn’t permanent. It reflects a historical shift in the ways people understand and relate to images. So it is worth exploring how that shift changes what we see, and how ghosts from the archive come to “haunt” the dataset. In establishing these two lenses of focus, we might find strategies of shifting between the two. This can help us better articulate the context of images that have built, and likely will continue to build, the foundations of generative AI systems and the images these systems produce.

Flickers in the Infrastructure

How did Flickr’s transition from archive to dataset allow it to become a piece of AI infrastructure?

It started with one of the first breakthroughs in AI image generation — StyleGAN 2. StyleGAN 2 could produce images of human faces that were nearly photorealistic. It was entirely a result of the FFHQ dataset, which NVIDIA made from 70,000 Flickr portraits of faces. NVIDIA’s dataset drew on photographs from Flickr and, notably, warned that the dataset would inherit Flickr’s biases. The FFHQ dataset also went on to be used for image labeling and face recognition technologies, too.

We can easily trace the influence of that dataset on the faces StyleGAN 2 produced. In 2019, I did my own analysis of that dataset, looking image by image at the collection. In so doing, I examined the dataset through the lens of an archive. I looked at it as a collection of individual photographs, and individual people. I discovered that less than 3% of the faces sampled from the dataset contained black women. As a result, the faces produced by the image model were less likely to generate faces of black women. When it did, they were less photorealistic than other faces. The absences were shaping what we saw.

If datasets are haunted, then the synthetic image is a seance — a way of generating a specter from the datasets. This word, specter, refers to both the appearance of a spirit, but also the appearance of an image, deriving from the Latin for spectrum. The synthetic image is a specter. It’s an image which appears from an unknown place. It is one slice from a spectrum of possible images associated with a prompt. Absences in the dataset constrained the output of possible images. Without black women in the dataset, black women were not in the images. This is one way absences can haunt the synthetic image.

But there is another case study worth exploring, which is the ways that Flickr haunts infrastructures of AI. How did the dataset shape the automated decision making processes that were then included in longer, more complex systems of image generation?

In part two of this blog post, we’ll look at YFCC100M, a dataset of 99.2 million photos released in June 2014. And we’ll look at the path it has taken as it moved the world’s relationship to this collection of Flickr images from images, into an archive, into a dataset. Along the way, we’ll see how that dataset, by becoming a go-to reference for calibration and testing of image recognition and synthesis, became infused into the infrastructures of generated images.

Prakash Krishnan, by Tamara Abdul Hadi

What’s drawn you to family archives?

For a class on research-creation methods (also referred to as arts-based research) I took in 2019, I had big ambitions of creating an experimental, non-narrative documentary using cellphone footage during a planned trip to my parents’ home country of Malaysia. Upon returning home and examining the footage, I unfortunately came to the realization that a combination of obsolete technology (an iPhone 4S in the year of the iPhone 11 – imagine!!!), corrupted sound, and sabotage by my own, unsteady hands rendered my footage unusable. Scrambling to find some way to complete my term project in the final weeks of the semester, I decided to undergo what I saw as an intrapersonal reflection via an investigation of my own family archives.

These “archives” are fairly small. Limited to the albums of photos my parents once carefully categorized and now just haphazardly store in a pile on a basement shelf. I confess that when living with my parents as a child, I was too self-centred to pay attention to any of the albums that didn’t include me. As the firstborn and only a year after my parents’ marriage, effectively I was prominently featured in all the albums except one. Paging through the album documenting my parents’ courtship, wedding, and first year of marriage, I was embarrassed by my shock of confronting these two people, whom I’ve evidently known my whole life, living this whole other life without me. As I passed on to the albums of my infancy, I became overcome with emotion seeing them making their life together, still virtually strangers having had an arranged marriage and finding themselves shortly thereafter in a new country, facing what I know now as the pressures that come with being not only new parents, but new immigrants, newly coupled, and struggling with finding lasting employment.

Inspired by my reaction to these albums, I planned on conducting an oral history interview with my parents. I wanted to know the people in these photos, what they were thinking, feeling, doing. Yet there was something holding me back. The photos were so intimate, often only one of them in the shot, as the other was behind the camera. These felt like private moments between the two of them that was solely theirs to wholly know and experience. Instead, I took a selection of these photos and wrote my own reflections. Searching back through my own memories of the rare times my parents spoke about their youth and the early years of their marriage, I pieced together a history of their early settlement and parenthood in Canada (circa 1991-1995) through written reflections and image descriptions I then inscribed on the digitized copies of the photos. I’m usually not a very emotionally expressive person, but I cried when I presented this to my class.

This experience fundamentally changed my relationship to photo archives and sowed the seeds for what would become my master’s thesis South Asian Instagram Community Archives: A Platform for Performance, Curation, and Identity as well as my approach to creative and poetic visual description for blind and low-vision communities as workshopped in the online exhibitions Audio Description in the Making and Air, River, Sea, Soil: A History of an Exploited Land.

Continuing this line of engagement along with my community archival engagement approaches prototyped in the community digital archive/exhibition project Things+Time,

What would you like to work on during your fellowship?

I will, over the course of my fellowship at the Flickr Foundation, work with two community organizations, one based in London, UK and one in Montreal, Canada to undergo a digital archival excavation workshop. Through a series of guided prompts and reflections, these community groups will decide specific search criteria in order to activate the Flickr archive, creating informal collections that respond to and inform the earlier reflections. Together, community members will create descriptions for the images that can dually serve as archival and visual descriptions for potential use in a future exhibition.

Using a “photovoice” methodology, participants will also be tasked with adding their own, related and annotated material to the Flickr archive in response to the collective and reflections feedback from the workshop.

The goals of this project are to engage community organizations with the creative possibilities afforded through archival and photo research as well as to unearth and activate some of the rich histories embedded in the Flickr archive.

A half-year of fellowing

Six months is not a very long time when you’re working with a team whose ambition is to secure a vast digital archive for the next 100 years. That’s only two quarters of a single calendar year; twelve two-week sprints, if you’re counting in agile time. It’s said that it takes around six months to feel competent in a new job. Six months is both no time at all and just enough time to get a grasp of your work environment.

Now it’s time for me, at the end of these six months, to look back at what I’ve done as the Flickr Foundation’s first Research Fellow. I’d like to share some reflections on my time here: what it was like to be a fellow, what I accomplished, and four crucial work-life lessons.

A bookshelf holds a small collection of art, tech and theory books. On top of the shelf is a coffee pot and mugs.

The heart of Flickr Foundation HQ. Coffee, library, sound system.

Sticky notes and a green poster on an office wall. The poster reads 'A Bao a Day' and features a drawing of a man in swim trunks drinking from a pineapple. A bao a day clearly leads to a good life.

Evidence of my work and many bao buns eaten.

A red plastic box of Celebrations chocolates next to a fruit bowl with apples

Rule #1 of Flickr Foundation: the Celebrations box shall never be empty.

How could I be a researcher?

When I joined Flickr Foundation, I was curious but apprehensive about what I could deliver for the organization. I didn’t come to this fellowship with an existing research practice. My last run of ‘proper’ education was at the turn of the century!

The thing about being the first is that there is no precedent, no blueprint I can follow to be a good fellow. And what flavor of research should I even be doing? I was open to the fellowship taking its own course, and my research and other day-to-day activities were largely led by the team’s 2024 plan. Having an open agenda meant that I could steer myself where I would be most useful, with an emphasis on the Flickr Commons and Content Mobility programs.

My mode of research followed my experience working in product management: conduct discovery and design research, synthesize it, and then extract and share insights that inform Flickr Foundation’s next steps. I had the freedom to adopt a research approach that clicked for me.

New perspectives, old inclinations

My main motivation in this fellowship was to get fresh perspectives on work I’d been doing for several years; that is, building digital services to help people find and use cultural heritage collections. I’d been doing it first as a librarian, then through managing several projects and products within the GLAM sector.

When working as a Product Manager on a multidisciplinary team within a large company, professional development was tightly tied to my role. When my daily focus was on team dynamic and delivering against a quarterly and annual plan, my self-development was geared toward optimizing those things. I didn’t get much time to look into the more distant future while working to shorter timelines. I relished the idea that I’d get some long-overdue, slow thinking time during my fellowship.

But I’ll be the first to admit that I couldn’t shake the habit of being in delivery mode. I was most drawn to practical, problem-solving work that was well within my comfort zone, like mapping an improved workflow for new members to register for Flickr Commons, or doing a content audit and making recommendations for improvements on flickr.org.

I stretched myself in other ways, particularly in my writing practice. I won’t claim to be prolific in publishing words to screen, but I wrote several things for Flickr Foundation, including:

co-writing a grant application to support the next stage of Data Lifeboat development
a blog post celebrating Flickr Commons’s 16th birthday
an overview of a big research study about collecting social media objects
internal research notes about responsible dataset documentation and applying CARE Principles to our work
and some personal blog posts, including one about my research topics and the one you’re reading now.

In the midst of all the doing, I read. A lot. (Here’s some of what I covered). Some days it felt like I was spending all day looking at text and video, which is both a luxury and a frustration. I felt an urge to put new knowledge into immediate practice. But when this knowledge is situated in the context of a 100-year plan, ‘immediate’ isn’t necessary. It’s a mindset shift that can challenge anyone’s output-focussed ways of working.

“Aim low, succeed often” and other career lessons

I want to conclude with four important takeaways from my fellowship. I’m confident they’ll find their way into the next stage of my career. Thank you to George Oates, Alex Chan, Jessamyn West, Ewa Spohn and all of the people I met through the Flickr Foundation for being part of this learning journey.

All technology is about people

Whether being a subject of data, being subjected to a technology, or even being the worker behind the tech, it’s always about people.

Take the Data Lifeboat project as an illustration. If the idea is to decouple Flickr photos from the flickr.com context in order to preserve them for the long term, you have to think about the people in the past who took the photos and their creative rights. What about the people depicted in the photos – did they consent? What would they think about being part of an archive? And what about the institutions who might be a destination for a Data Lifeboat? Guess what, it’s totally made of people who work in complex organizations with legacies, guidelines and policies and strategies of their own (and those governance structures made by people).

To build technology responsibly is a human endeavor, not a mere technical problem. We owe it to each other to do it well.

Slow down, take a longer view

One of the team mottos is “Aim low, succeed often”; meaning, take on only what feels sustainable and take your time to make it achievable. We’re working to build an organization that’s meant to steward our photographic heritage for a century and beyond. We can’t solve everything at once.

This was a much needed departure from the hamster-wheel of agile product delivery and the planning cycles that can accompany it. But it also fits nicely into product-focussed ways of working – in fact, it’s the ideal. Break down the hard stuff into smaller, achievable aims but have that long-term vision in mind. Work sustainably.

A strong metaphor brings people along

How do you describe a thing that doesn’t exist yet? If you want to shape a narrative for a new product or service, a strong metaphor is your friend. Simple everyday concepts that everyone can understand are perfect framing devices for novel work that’s complex and ambiguous.

The Data Lifeboat project is held afloat with metaphor. We dock, we build a network of Safe Harbors. If you’re a visual kind of person, the metaphor draws its own picture. Metaphor is helping us to navigate something that’s not perfectly defined.

Meeting culture is how you meet

Everyone in almost every office-based workplace complains about all of the meetings. Having spent time in a small office with three other Flickr Foundation staff, I’ve learned that meeting culture is just how you meet. If you sit around a table and have coffee while talking about what you’re working on, that’s a meeting. It’s been a joy to break out of the dominant, pre-scheduled, standing one-hour meeting culture. Let’s go for a walk. Let’s figure out the best way to get stuff done together.

This is me, captured by George at a Flickr Photowalk in London.

For all of us at Flickr Foundation, the idea of Flickr as an archive in waiting inspires our core purpose. We believe the billions of photos that have amassed on Flickr in the last 20 years have potential to be the material of future historical research. With so much of our everyday lives being captured digitally and posted to public platforms, we – both the Flickr Foundation and the wider cultural heritage community – have begun figuring out how to proactively gather, make available, and preserve digital images and their metadata for the long term.

In this blog post, I’m setting my sights beyond technology to consider the institutional and social aspects that enable the collection of digital photography from online platforms.

It’s made of people

Our Data Lifeboat project is now underway. Its goal is to build a mechanism to make it possible to assemble and decentralize slivers of Flickr photos for potential future users. (You can read project update 1 and project update 2 for the background). The outcome of the first project phase will be one or more prototypes we will show to our Flickr Commons partners for feedback. We’re already looking ahead to the second phase where we will work with cultural heritage institutions within the wider Flickr Commons network to make sure that anything we put into production best suits cultural heritage institutions’ real-world needs.

We’ve been considering multiple possible use cases for creating, and importantly, docking a Data Lifeboat in a safe place. The two primary institutional use cases we see are:

Cultural heritage institutions want to proactively collect born digital photography on topics relevant to their collections
In an emergency situation, cultural heritage institutions (and maybe other Flickr members) want to save what they can from a sinking online platform – either photos they’ve uploaded or generously saving whatever they can. (And let me be clear: Flickr.com is thriving! But it’s better to design for a worst-case scenario than to find ourselves scrambling for a solution with no time to spare.)

We are working towards our Flickr Commons members (and other interested institutions) being able to accept Data Lifeboats as archival materials. For this to succeed, “dock” institutions will need to:

Be able to use it, and have the technology to accept it
Already have a view on collecting born digital photography, and ideally this type of media is included in their collection development strategy. (This is probably more important.)

This isn’t just a technology problem. It’s a problem made of everything else the technology is made of: people who work in cultural heritage institutions, their policies, organizational strategies, legal obligations, funding, commitment to maintenance, the willing consent of people who post their photos to online platforms and lots more.

To preserve born digital photos from the web requires the enthusiastic backing of institutions—which are fundamentally social creatures—to do what they’re designed to do, which is to save and ensure access to the raw material of future research.

Collecting social photography

I’ve been doing some background research to inform the early stages of Data Lifeboat development. I came across the 2020 Collecting Social Photography (CoSoPho) research project, which set out to understand how photography is used in social media in order to be able to develop methods for collection and transmission to future generations. Their report, ‘Connect to Collect: approaches to collecting social digital photography in museums and archives’, is freely available as PDF.

The project collaborators were:

The Nordic Museum / Nordiska Museet
Stockholm County Museum / Stockholms Läns Museum
Aalborg City Archives / Aalborg Stadsarkiv
The Finnish Museum of Photography / Finland’s Fotografiska Museum
Department of Social Anthropology, Stockholm University

The CoSoPho project was a response to the current state of digital social photography and its collection/acquisition – or lack thereof – by museums and archives.

Implicit to the team’s research is that digital photography from online platforms is worth collecting. Three big questions were centered in their research:

How can data collection policies and practices be adapted to create relevant and accessible collections of social digital photography?
How can digital archives, collection databases and interfaces be relevantly adapted – considering the character of the social digital photograph and digital context – to serve different stakeholders and end users?
How can museums and archives change their role when collecting and disseminating, to increase user influence in the whole life circle of the vernacular photographic cultural heritage?

There’s a lot in this report that is relevant to the Data Lifeboat project. The team’s research focussed on ‘digital social photography’, taken to mean any born digital photos that are taken for the purpose of sharing on social media. It interrogates Flickr alongside Snapchat, Facebook, Instagram, as well as region-specific social media sites like IRC-Galleria (a very early 2000s Finnish social media platform).

I would consider Flickr a bit different to the other apps mentioned, only because it doesn’t address the other Flickr-specific use cases such as:

Showcasing photography as craft
Using Flickr as a public photo repository or image library where photos can be downloaded and re-used outside of Flickr, unlike walled garden apps like Instagram or Snapchat.

The ‘massification’ of images

The CoSoPho project highlighted the challenges of collecting digital photos of today while simultaneously digitizing analog images from the past, the latter of which cultural heritage institutions have been actively doing for many years. Anna Dahlgren describes this as a “‘massification’ of images online”. The complexities of digital social photos, with their continually changing and growing dynamic connections, combined with the unstoppable growth of social platforms, pose certain challenges for libraries, archives and museums to collect and preserve.

To collect digital photos requires a concerted effort to change the paradigm:

from static accumulation to dynamic connection
from hierarchical files to interlinked files
and from pre-selected quantities of documents to aggregation of unpredictably variable image and data objects.

Dahlgren argues that “…in order to collect and preserve digital cultural heritage, the infrastructure of memory institutions has to be decisively changed.”

The value of collecting and contributing

“Put bluntly, if images on Instagram, Facebook or any other open online platform should be collected by museums and archives what would the added value be? Or, put differently, if the images and texts appearing on these sites are already open and public, what is the role of the museum, or what is the added value of having the same contents and images available on a museum site?” (A. Dahlgren)

Those of us working in the cultural heritage sector can imagine many good responses to this question. At the Flickr Foundation, we look to our recent internet history and how many web platforms have been taken offline. Our digital lives are at risk of disappearing. Museums, libraries and archives have that long-term commitment to preservation. They are repositories of future knowledge, and expect to be there to provide access to it.

Cultural heritage institutions that choose to collect from social online spaces can forge a path for a multiplicity of voices within collections, moving beyond standardized metadata toward richer, more varied descriptions from the communities from which the photos are drawn. There is significant potential to collect in collaboration with the publics the institution serves. This is a great opportunity to design for a more inclusive ethics of care into collections.

But what about potential contributors whose photos are being considered for collection by institutions? What values might they apply to these collections?

CoSoPho uncovered useful insights about how people participating in community-driven collecting projects considered their own contributions. Contributors wanted to be selective about which of their photos would make it into a collection; this could be for aesthetic reasons (choosing the best, most representative photos) or concerns for their own or others’ anonymity. Explicit consent to include one’s photos in a future archive was a common theme – and one which we’re thinking deeply about.

Overall, people responded positively to the idea of cultural institutions collecting digital social photos – they too can be part of history!— and also think it’s important that the community from which those photos are drawn have a say in what is collected and how it’s made available. Future user researchers at Flickr Foundation might want to explore contributor sentiment even further.

What’s this got to do with Data Lifeboats?

As an intermediary between billions of Flickr photos and cultural heritage institutions, we need to create the possibilities for long-term preservation of this rich vein of digital history. These considerations will help us to design a system that works for Flickr members and museums and archives.

Adapting collection development practices

All signs point to cultural heritage institutions needing to prepare to take on born digital items. Many are already doing this as part of their acquisition strategies, but most often this born digital material comes entangled in a larger archival collection.

If institutions aren’t ready to proactively collect born digital material from the public web, this is a risk to the longevity of this type of knowledge. And if this isn’t a problem that currently matters to institutions, how can we convince them to save Flickr photos?

As we move into the next phase of the Data Lifeboat project, we want to find out:

Are Flickr Commons member institutions already collecting, or considering collecting, born digital material?
What kinds of barriers do they face?

Enabling consent and self-determination

CoSoPho’s research surfaced the critical importance of consent, ownership and self-determination in determining how public users/contributors engage with their role in creating a new digital archive.

How do we address issues of consent when preserving photos that belong to creators?
How do we create a system that allows living contributors to have a say in what is preserved, and how it’s presented?
How do we design a system that enables the informed collection of a living archive?
Is there a form of donor agreement or an opt-in to encourage this ethics of care?

Getting choosy

With 50 billion Flickr photos, not all of them visible to the public or openly licensed, we are working from the assumption that the Data Lifeboat needs to enable selective collecting.

Are there acquisition practices and policies within Flickr Commons institutions that can inform how we enable users to choose what goes into a Data Lifeboat?
What policies for protecting data subjects in collections need to be observed?
Are there existing paradigms for public engagement for proactive, social collecting that the Data Lifeboat technology can enable?

Co-designing usable software

Cultural heritage institutions have massively complex technical environments with a wide variety of collection management systems, digital asset management systems and more. This complexity often means that institutions miss out on chances to integrate community-created content into their collections.

The CoSoPho research team developed a prototype for collecting digital social photography. That work was attempting to address some of these significant tech challenges, which Flickr Foundation is already considering:

Individual institutions need reliable, modern software that interfaces with their internal systems; few institutions have internal engineering capacity to design, build and maintain their own custom software
Current collection management systems don’t have a lot of room for community-driven metadata; this information is often wedged in to local data fields
Collection management systems lack the ability to synchronize data with social media platforms (and vice versa) if the data changes. That makes it more difficult to use third-party platforms for community description and collecting projects.

So there’s a huge opportunity for the Flickr Foundation to contribute software that works with this complexity to solve real challenges for institutions. Co-design–that is, a design process that draws on your professional expertise and institutional realities–is the way forward!

We need you!

We are working on the challenge of keeping Flickr photos visible for 100 years and we believe it’s essential that cultural heritage institutions are involved. Therefore, we want to make sure we’re building something that works for as many organizations as possible – both big and small – no matter where you are in your plans to collect born digital content from the web.

If you’re part of the Flickr Commons network already, we are planning two co-design workshops for Autumn 2024, one to be held in the US and the other likely to be in London. Keep your eyes peeled for Save-the-Date invitations, or let us know you’re interested, and we’ll be sure to keep you in the loop directly.

This work is supported by the National Endowment for the Humanities.

Introducing Eryk Salvaggio, 2024 Research Fellow

Eryk Salvaggio is a researcher and new media artist interested in the social and cultural impacts of artificial intelligence. His work, which is centered in creative misuse and the right to refuse, critiques the mythologies and ideologies of tech design that ignore the gaps between datasets and the world they claim to represent. A blend of hacker, policy researcher, designer and artist, he has been published in academic journals, spoken at music and film festivals, and consulted on tech policy at the national level.

Ghosts in the Archives Become Ghosts in the Machines

I’m honored to be joining the Flickr Foundation to imagine the next 100 years of Flickr, thinking critically about the relationships between datasets, history, and archives in the age of generative AI.

AI is thick with stories, but we tend to only focus on one of them. The big AI story is that, with enough data and enough computing power, we might someday build a new caretaker for the human race: a so-called “superintelligence.” While this story drives human dreams and fears—and dominates the media sphere and policy imagination—it obscures the more realistic story about AI: what it is, what it means, and how it was built.

The invisible stories of AI are hidden in its training data. They are human: photographs of loved ones, favorite places, things meant to be looked at and shared. Some of them are tragic or traumatic. When we look at the output of a large language model (LLM), or the images made by a diffusion model, we’re seeing a reanimation of thousands of points of visual data — data that was generated by people like you and me, posting experiences and art to other people over the World Wide Web. It’s the story of our heritage, archives and the vast body of human visual culture.

I approach generated images as a kind of seance, a reanimation of these archives and data points which serve as the techno-social debris of our past. These images are broken down — diffused — into new images by machine learning models. But what ghosts from the past move into the images these models make? What haunts the generated image from within the training data?

Seance of the Digital Image, Eryk Salvaggio

In “Seance of the Digital Image” I began to seek out the “ghosts” that haunt the material that machines use to make new images. In my residency with the Flickr Foundation, I’ll continue to dig into training data — particularly, the Flickr Commons collection — to see the ways it shapes AI-generated images. These will not be one to one correlations, because that’s not how these models work.

So how do these diffusion models work? How do we make an image with AI? The answer to this question is often technical: a system of diffusion, in which training images are broken down into noise and reassembled. But this answer ignores the cultural component of the generated image. Generative AI is a product of training datasets scraped from the web, and entangled in these datasets are vast troves of cultural heritage data and photographic archives. When training data-driven AI tools, we are diffusing data, but we are also diffusing visual culture.

Eryk Salvaggio: Flowers Blooming Backward Into Noise (2023) from ARRG! on Vimeo.

In my research, I have developed a methodology for “reading” AI-generated images as the products of these datasets, as a way of interrogating the biases that underwrite them. Since then, I have taken an interest in this way of reading for understanding the lineage, or genealogy, of generated images: what stew do these images make with our archives? Where does it learn the concept of what represents a person, or a tree, or even an archive? Again, we know the technical answer. But what is the cultural answer to this question?

By looking at generated images and the prompts used to make them, we’ll build a way to map their lineages: the history that shapes and defines key concepts and words for image models. My hope is that this endeavor shows us new ways of looking at generated images, and to surface new stories about what such images mean.

As the tech industry continues building new infrastructures on this training data, our window of opportunity for deciding what we give away to these machines is closing, and understanding what is in those datasets is difficult, if not impossible. Much of the training data is proprietary, or has been taken offline. While we cannot map generated images to their true training data, massive online archives like Flickr give us insight into what they might be. Through my work with the Flickr Foundation, I’ll look at the images from institutions and users to think about what these images mean in this generated era.

In this sense, I will interrogate what haunts a generated image, but also what haunts the original archives: what stories do we tell, and which do we lose? I hope to reverse the generated image in a meaningful way: to break the resulting image apart, tackling correlations between the datasets that train them, the archives that built those datasets, and the images that emerge from those entanglements.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Ghost Stays in the Picture, Part 3: The Power of the Image