Data Lifeboat Update 3

March has been productive. The short version is it’s complicated but we’re exploring happily, and adjusting the scope in small ways to help simplify it. Let me summarise the main things we did this month.

Legal workshop

We welcomed two of our advisors—Neil from the Bodleian and Andrea from GLAM e-Lab—to our HQ to get into the nitty gritty of what a 50-year-old Data Lifeboat needs to accommodate. 

As we began the conversation, I centred us in the C.A.R.E. Principles and asked that we always keep them in our sights for this work. The main future challenges are settling around the questions of how identity and the right to be forgotten must be expressed, how Flickr account holders can or should be identified, and whether an external name resolver service of some kind could help us. We think we should develop policies for Flickr members (on consent to be in a Data Lifeboat), Data Lifeboat creators (on their obligations as creators), and Dock Operators (an operations manual & obligations for operating a dock). It’s possible there will also be some challenges ahead around database rights, but we don’t know enough yet to give a good update. We’d like a first-take legal framework of the Data Lifeboat system to be an outcome of these first six months.

Privacy & licensing

These are key concepts central to Flickr—privacy and licensing—and we must make sure we do our utmost to respect them in all our work. It would be irresponsible for us to jettison the desires encoded in those settings for our convenience, tempting though that may be. By that I mean, it would be easier for us to make Data Lifeboats that contained whatever photos from whomever, but we must respect the desires of Flickr creators in the creation process. 

There are still big and unanswered questions about consent, and how we get millions of Flickr members to agree to participate and give permission to allow their pictures to be put in other people’s Data Lifeboats. 

Extending the prototype Data Lifeboat sets 

Initially, we had planned to run this 6-month prototype stage with just one test set of images, which would be some or all of the Flickr Commons photographs. But in order to explore the challenges around privacy and licensing, we’ve decided to expand our set of working prototypes to also include the entire Library of Congress Flickr Commons account, and all the photos tagged with “flickrhq” (since that set is something the Flickr Foundation may decide to collect for its own archive and contains photographs from different Flickr members who also happen to have been Flickr staff and would therefore (theoretically) be more sympathetic to the consent question).

Visit to Greenwich

Ewa spotted that there was an exhibition of ambrotype photographic portraits of women in the RNLI at the Maritime Museum in Greenwich at the moment, so we decided to take a day trip to see the portraits and poke around the brilliant museum. We ended up taking a boat from Greenwich to Battersea which was a nice way to experience the Thames (and check out that boat’s life saving capabilities).

Day Out: Maritime Museum & Lifeboats

Day Out: Maritime Museum & Lifeboats

The Data Lifeboat creation process

I found myself needing to start sketching out what it could look like to actually create a Data Lifeboat, and particularly not via a command line, so we spent a while in front of a whiteboard kicking that off. 

At this point, we’re imagining a few key steps:

  1. The Query – “I want these photos” – is like a search. We could borrow from our existing Flinumeratr toy.
  2. The Results – Show the images, some metadata. But it’s hard to show information about the set in aggregate at this stage, e.g., how many of the contents are licensed in which way. This could form a manifest for the Data Lifeboat..
  3. Agreement – We think there’s a need for the Data Lifeboat creator to agree to certain terms. Simple, active language that echoes the CARE principles, API ToS, and Flickr Community Guidelines. We think this should also be included in the Data Lifeboat it’s connected with.
  4. README / Note to the Future – we love the idea that the Data Lifeboat creator could add a descriptive narrative at this point, about why they are making this lifeboat, and for whom, but we recognised that this may not get done at all, especially if it’s too complicated or time-consuming. This is also a good spot to describe or configure warnings, timers, or other conditions needed for future access. Thanks also to two of our other advisors – Commons members Mary Grace and Alan – who shared with us their organisation’s policies on acquisitions for reference.
  5. Packaging – This would be asynchronous and invisible to the creator; downloading everything in the background. We realised it could take days, especially if there are lots of Data Lifeboats being made at once.
  6. Ready! – The Data Lifeboat creator gets a note somehow about the Data Lifeboat being ready for download. We may need to consider keeping it available only for a short time(?).

Creation Schematic, 19th March

Emergency v Non-Emergency 

We keep coming up against this… 

The original concept of the Data Lifeboat is a response to the near-death experience that Flickr had in 2017 when its then-owner, Verizon/Yahoo, almost decided to vaporise it because they deemed it too expensive to sell (something known as “the cost of economic divestment”). So, in the event of that kind of emergency, we’d want to try to save as much of this unique collection as possible as quickly as possible, so we’d need a million lifeboats full of pictures created more or less simultaneously or certainly in a relatively short period of time. 

In the early days of this work, Alex said that the pressure of this kind of emergency would be the equivalent of being “hugged to death by the archivists,” as we all try— in very caring and responsible ways—to save as much as we can. And then there’s the bazillion-emergency-hits-to-the-API-connection problem—aka the “Thundering Herd” problem—which we do not yet have a solution for, and which is very likely to affect any other social media platforms that may also be curious to explore this concept.

We’re connecting with the Flickr.com team to start discussing how to address this challenge. We’re beginning to think about how emergency selection might work, as well as the present, and future, challenges of establishing the identity of photo subjects and account owners. The millions of lifeboats that would be created would surely need the support of the company to launch if they’re ever needed.

New! Flickr Commons Explorer

commons.flickr.org

At the Flickr Foundation, one of the goals we set early on when we took over responsibility for running the Flickr Commons program was to build an improved ‘discovery layer’ for the Commons collection.

We’re pleased to share with you a first look at our new Commons Explorer, available at commons.flickr.org.

We’ve built the explorer using the standard Flickr API, and have created a secondary database which is updated pretty regularly. (This is a way of us saying not all the data is live live.) And, please note that photos on display link back to flickr.com.

It’s a work in development, but we wanted to show you our progress in this early version. We’ve prioritized being able to look across the Commons in an interface that’s richer than search results. We’re surfacing activity levels across the collection too, to show that there’s a ton of chatting happening, and new uploads all the time.

The views we’ve built so far:

Home page

This is a list of recent uploads from across the Commons collection, and a sample of our members.

Members

This is a list of all the Flickr Commons members, which you can sort in different ways. We’ve set it to be sorted by the member with the most recent upload, so you’ll see active members at a glance.

Each member has their own page, where you can see their popular tags, interesting photos, and recent uploads.

Conversations

For the first time ever, you can enjoy catching up on the last week’s conversations about photos in Flickr Commons. You’ll see immediately the fantastic community that’s grown up around members like the National Library of Ireland’s photostream, and get to know some of the volunteer researchers inhabiting and contributing their time and detective skills to enrich the Commons.

Stats

Here we present activity across the collection, like uploading volumes, comments, and popular tags across the collection…

About

A simple static page which outlines what we’re doing. And finally…

Search!

We’ve made a bone simple search for the explorer too, so you can quickly see a splat of pictures about just about anything. Even with a few million photos, there’s a huge range of tagging and other description happening. Jump into London, pie, Istanbul, and smiles, or just look for the magnifying glass in the top right of the nav bar.

We hope you enjoy exploring, and, please let us know if you have ideas for how we can improve upon what’s there so far!

In other Flickr Commons news

We are working with the Flickr company to develop a new set of API methods the Foundation will be able to use to build the member management tools we need to really lean into rejuvenating the Commons and especially growing the new membership. If we can introduce 5-10 new members to the program this year, we’ll be stoked! More, we’ll be even stoked-er.

This will involve a new home for registrations of interest, and a smoother onboarding experience for new members as they come on board. Generally, we’re looking forward to new insights into the overall health of the program in the form of better views on activity (the beginnings of which you can see in commons.flickr.org).

If you are either from an existing member institution, or you’re curious about joining in and sharing your historical photography collections, please let us know.

In our early research back in 2021, we noted we wanted to get to know more of the volunteer community too, and see if we can learn about their needs for participating with research and commentary, and I’m pleased to report we’ve begun that, with our first interview with a prominent community member last week. (I was so excited I could barely talk, but Jessamyn wisely recorded the conversation and will be reporting on it soon.)

Introducing Eryk Salvaggio, 2024 Research Fellow

Eryk Salvaggio is a researcher and new media artist interested in the social and cultural impacts of artificial intelligence. His work, which is centered in creative misuse and the right to refuse, critiques the mythologies and ideologies of tech design that ignore the gaps between datasets and the world they claim to represent. A blend of hacker, policy researcher, designer and artist, he has been published in academic journals, spoken at music and film festivals, and consulted on tech policy at the national level.

Ghosts in the Archives Become Ghosts in the Machines

I’m honored to be joining the Flickr Foundation to imagine  the next 100 years of Flickr, thinking critically about the relationships between datasets, history, and archives in the age of generative AI. 

AI is thick with stories, but we tend to only focus on one of them. The big AI story is that, with enough data and enough computing power, we might someday build a new caretaker for the human race: a so-called “superintelligence.” While this story drives human dreams and fears—and dominates the media sphere and policy imagination—it obscures the more realistic story about AI: what it is, what it means, and how it was built.

The invisible stories of AI are hidden in its training data. They are human: photographs of loved ones, favorite places, things meant to be looked at and shared. Some of them are tragic or traumatic. When we look at the output of a large language model (LLM), or the images made by a diffusion model, we’re seeing a reanimation of thousands of points of visual data — data that was generated by people like you and me, posting experiences and art to other people over the World Wide Web. It’s the story of our heritage, archives and the vast body of human visual culture. 

I approach generated images as a kind of seance, a reanimation of these archives and data points which serve as the techno-social debris of our past. These images are broken down — diffused — into new images by machine learning models. But what ghosts from the past move into the images these models make? What haunts the generated image from within the training data? 

In “Seance of the Digital Image” I began to seek out the “ghosts” that haunt the material that machines use to make new images. In my residency with the Flickr Foundation, I’ll continue to dig into training data — particularly, the Flickr Commons collection — to see the ways it shapes AI-generated images. These will not be one to one correlations, because that’s not how these models work.

So how do these diffusion models work? How do we make an image with AI? The answer to this question is often technical: a system of diffusion, in which training images are broken down into noise and reassembled. But this answer ignores the cultural component of the generated image. Generative AI is a product of training datasets scraped from the web, and entangled in these datasets are vast troves of cultural heritage data and photographic archives. When training data-driven AI tools, we are diffusing data, but we are also diffusing visual culture. 

 

Eryk Salvaggio: Flowers Blooming Backward Into Noise (2023) from ARRG! on Vimeo.

 

In my research, I have developed a methodology for “reading” AI-generated images as the products of these datasets, as a way of interrogating the biases that underwrite them. Since then, I have taken an interest in this way of reading for understanding the lineage, or genealogy, of generated images: what stew do these images make with our archives? Where does it learn the concept of what represents a person, or a tree, or even an archive? Again, we know the technical answer. But what is the cultural answer to this question? 

By looking at generated images and the prompts used to make them, we’ll build a way to map their lineages: the history that shapes and defines key concepts and words for image models. My hope is that this endeavor shows us new ways of looking at generated images, and to surface new stories about what such images mean.

As the tech industry continues building new infrastructures on this training data, our window of opportunity for deciding what we give away to these machines is closing, and understanding what is in those datasets is difficult, if not impossible. Much of the training data is proprietary, or has been taken offline. While we cannot map generated images to their true training data, massive online archives like Flickr give us insight into what they might be. Through my work with the Flickr Foundation, I’ll look at the images from institutions and users to think about what these images mean in this generated era. 

In this sense, I will interrogate what haunts a generated image, but also what haunts the original archives: what stories do we tell, and which do we lose? I hope to reverse the generated image in a meaningful way: to break the resulting image apart, tackling correlations between the datasets that train them, the archives that built those datasets, and the images that emerge from those entanglements.

Data Lifeboat: NEH Grant Update 1

By Ewa Spohn

And we’re off! Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our work on the Data Lifeboat has started, in our Content Mobility program. We’ll be posting an update for you each month. 

hand-drawn sketch of a decentralization methodology, Feb 2nd, 2024

Excellent Lifeboat-related game and book brought in by Alex for our Kick-off

What’s a Data Lifeboat?

A quick recap for those not familiar with the concept, from our grant narrative:

A Data Lifeboat is an archival piece of Flickr, not all of the 50 billion images and their metadata. For example, a Lifeboat might contain all the photos tagged with “sunflower” or all the Recipes to Share group submissions. Whatever facet of the data you can think of, you could generate a Data Lifeboat for it. We envision an archival sliver richer than a mere folder of JPGs: one where you can navigate the content to explore and understand its networked context. Even better, an archival sliver that is updated if things change at flickr.com. Our goals with this project are to create several rough prototypes of the software, develop a reasonably detailed understanding of the main technical challenges, prepare a survey of critical ongoing legal issues, and establish a robust design direction for further product development.”

This idea was born from two challenges: 1) Flickr contains a multitude of shared histories, and is owned and controlled by a corporation which could decide to close the service, which—as we’ve seen in the past—can result in the destruction of cultural heritage, and 2) the Flickr archive is huge and, in its current form, impossible for any one archival institution to take on.

Flickr’s 50 billion or so photos reflect our diverse heritage, traditions, and history back to us in a unique way. The collection is also born digital, a massive advantage over conventional archives, because the photographers usually describe their pictures themselves as they publish. The pictures are also enriched by the network of social activity that surrounds them, which – again – is unique to the Flickrverse. Finally, this kind of volume is astonishing: Flickr and other platforms like it are orders of magnitude larger than our biggest cultural collections to date.

At the Foundation, we believe we must begin to treat this collection as we would our ‘traditional’ great libraries, archives, and collections. Time is of the essence: the commercial platforms that host these kinds of huge collections can (and do) disappear, effectively sinking our heritage along with them. Our hope is that a Data Lifeboat will carry Flickr images away from the possibility of a sinking ship unscathed. Our future plans include developing the idea of a “dock” in a “safe harbour” – somewhere specific for the Data Lifeboat to land and be preserved.

The scope of the grant

We’re using the NEH grant to create two identical prototype Data Lifeboats containing a selection of the Flickr Commons. This will (hopefully) be a richer archival format that allows for the exploration of content within its networked context, and one that can be updated when changes are introduced in Flickr.com. Importantly, we want to place these two Lifeboats in two different places, a proxy for our developing goal of “safe harbours” for them.

This phase of the project, making a demonstrable prototype, or prototypes, is scheduled to end mid-year.

Our crew

It’s an exciting and completely new thing, and working on it is a multidisciplinary team drawn from both the Flickr Foundation and our Flickr Commons members and advisors:

  • George Oates, Project Director, who provides strategic and design input, and financial oversight
  • Alex Chan, Tech Lead, who is developing the core of our prototypes
  • Jessamyn West, Community Manager, who leads our communication with the Flickr Commons collaborators, the project advisors, as well as broader audiences
  • Ewa Spohn, Project Manager, who ensures the team sticks to the plan. And budget

We’re excited to engage some of our Flickr Commons members directly for the first time, too. The Flickr Foundation team will be joined by staff from three of our member institutions:

And finally, our advisors, who bring a wide range of experience and knowledge and will help us make sure we build stuff for the long term:

Kick-off? Done.

We’re about to have our first all-hands meeting, although in late January we took advantage of Dietrich’s short visit to London to hold our first face-to-face workshop. 

Jenn, Ewa, George, Alex, Stef, and Dietrich (the photographer) at our kick-off

Jenn, Ewa, George, Alex, Stef, and Dietrich (the photographer) at our kick-off at HQ

Over coffee and sugary snacks, we spent two days exploring decentralized storage and how it could be applied to archiving digital content, and thinking through a possible schema for the data in a data lifeboat. 

Emerging questions

Many, many (#many) questions arose (for which we currently have no answers), for example:

  • Is a Data Lifeboat launched in response to an emergency or as part of regular housekeeping?
  • What must a Data Lifeboat contain? Could it initially just be a manifest and the images (which are large and expensive to process) are added later?
  • Who decides what is in (and out) of a Data Lifeboat and to what extent should it feel like an active selection?
  • Where are the ‘edges’ of the network surrounding a Flickr photo, and what is a holistic archive?
  • What existing digital asset management formats could (should) a Data Lifeboat be consistent with for it to be ‘docked’ successfully?

We were also very pleasantly surprised by the power of our lifeboat metaphor and how far we could stretch it to help coordinate our thinking! And thanks again to Dietrich for sharing time with us to crack the project open.

Next steps

Next up is our first all-hands meeting to bring the whole project team up to speed with the plan. That’ll be followed by a deep dive review of digital asset management systems in cultural institutions and a survey of the legal rocks that a Data Lifeboat may encounter. We think that will give us enough to allow us to define some high level requirements for the prototypes so that the development proper can start towards the end of the month.

Somewhere among all that we’re also planning a team expedition to a lifeboat museum to learn more about how lifeboats work in the physical world, but more about that in another blog post…

 

This work is supported by the National Endowment for the Humanities.

NEH logo

Welcome, Susan!

Introducing Susan Mernit, Our New Development Lead

Hello, Flickr family and friends! I’m Susan Mernit, stepping into the role of Development Lead for the Flickr Foundation. My journey with Flickr began in 2004, the vibrant early days of digital photography. With nearly 5,000 snapshots—capturing everything from adventurous trips to China, Korea, and Peru to countless moments at tech gatherings—Flickr has been my digital photo album. Reflecting on those days, it’s not only the images that resonate, it’s the stories they tell and the community they’ve fostered.

Before joining this brilliant team, I led as Executive Director at The Crucible in Oakland, California, an innovative hub for artisan arts, and co-founded Hack the Hood, a nonprofit that helps low-income youth of color build skills for tech careers. My very first full-time job was as a community manager at a poetry organization, and I worked my way through college in the library.  With a history in the tech world—including time at Yahoo that overlapped with the Flickr acquisition—my career has been shaped around community engagement, open-source,  and product innovation.

So what am I going to do exactly?

Working alongside George, our visionary Executive Director, my goal is clear: to ensure the Flickr Foundation secures the resources to turn our 100-year plan into a 100-year reality. From cultivating relationships with foundations and corporate partners to reaching out to our global community of individual supporters, my job is to help build a sustainable future for the Foundation.

Beyond my professional life, I find balance and strength in weightlifting, Iyengar yoga, and hiking around the SF Bay area.  I am a compulsive reader, enjoying literary fiction, biographies, and books about tech, economics, and business.  My most used app on my phone is a US library platform called Libby. I welcome recommendations for great reads—let me know when you have one. 

Our plan for 2024: Flickr Commons & Data Lifeboat & the 100-year Plan

Find out more about our nefarious schemes for the coming year…

 

When I do planning, I usually carve it up along three axes: Projects, Pipeline, and People. I want to keep our project list very short in 2024. That allows us to focus more deeply, I think, and spend time thinking and waxing and wandering a bit as we map the new terrain of our mission, to keep Flickr images visible for 100 years.

Projects

There are three main flows of project work for the team:

  1. Flickr Commons nurturing and growing
  2. Start Data Lifeboat
  3. Continue 100-year plan ideation and workshopping

Flickr Commons

Flickr Commons turned 16 years old last week. To celebrate, we launched the first instantiation of a new front door which lives at commons.flickr.org. The intent is to help Commons fans explore the different members’ collections more easily, and get a sense of recent activity across the aggregate. We hope to do another handful of releases over this year and beyond.

The other good news is that we’re nearly, finally, ready to welcome new members into the program. The software that supported new registrations and members had decayed a bit over the last decade, so, working with the company team—thanks Ruppel et al—we’ve co-designed a new set of Commons-specific APIs that will help the Foundation really lean into supporting Flickr Commons members from now on.

We are going to build: 1) a new registration form, 2) improved onboarding resources/workflow, 3) the new discovery layer you can now see at commons.flickr.org, and 4) better admin tools for the team to watch over the health of the program, and the happiness of our members. This will all be rolling out in the first half of this year. I don’t have a date for our first new tranche of members, but rest assured, we’ll let you know!

Later in the year, we want to find out a lot more about Flickr members interact with Flickr Commons and see if we can support them to more easily keep track of their input and progress. If you fit into this group, we’d like to know you!

Data Lifeboat

Last year, we applied to the National Endowment for the Humanities (NEH) to develop a first set of prototyping for our Data Lifeboat concept. That’s the idea that we should actually plan for a possible end of flickr.com, developing “lifeboats” that can carry Flickr photos to other places if the big ship goes down. It was gratifying that the NEH decided to support this first block of work.

Our framing for the grant is to create two identical lifeboats containing Flickr pictures, “objective metadata” like EXIF, and a first crack at “social metadata”—the stuff that is only created on Flickr—because we think that’s essential for longer term contextual, archival framing of the existence of a Flickr photo. After all, on Flickr (and off) a photo is a social object, that is discussed, arranged, annotated, pointed at, and displayed, and EXIF data (the data that is created when a digital camera takes a photograph) falls short.

We’re planning to post NEH-grant-specific updates the blog at the end of each month, so stay tuned for that. (I’d better write that next!)

The 100-year plan

I don’t have a structure or plan written yet. But, I’ve really enjoyed all the discussions I’ve had about the idea, and especially the various workshops we’ve run in different groups about the idea. Basically, the workshop is called How to write a 100-year plan and my opening gambit is “I don’t know, what do you think” and conversation ensues.

We do hope to be able to at least get that workshop into a form where you might be able to run it without us. We’d let you know about that too.

Pipeline

We’re just over one year old, launching officially in November 2022. We’ve had an amazing start, thanks to support from SmugMug and our first cornerstone funder, Filecoin Foundation for the Decentralized Web. Since then, we’ve figured out how to accept donations of cash online via Stripe, and even stock donations! We’ve sketched out the grants we’re planning to apply for too.

People

Ewa Spohn, who also helped write the NEH grant for Data Lifeboat, has joined the crew to manage the project. With a background in mechanical engineering, program management, and people-arranging, we’re lucky to have her! Welcome, Ewa!

We’ve brought on a new part-time team member to help wrangle our Pipeline work, Susan Mernit. (Check out her sledgehammer!!) A veteran of the tech industry, Susan changed gears to lead two non-profits in California, to great success. She’s now working with nonprofits to help shore up their development plans and strategy, and we’re very glad she’s come on board to support us.

And, in case you missed it, we’re hiring: Our first job ad for this year is Archivist. It’s live now, closing January 31st.

Welcome, Jenn!

Meet the Foundation’s first ever Research Fellow!

It is with great pleasure that I introduce you to the Flickr Foundation’s inaugural research fellow, Jenn. In her own words…

Hi I’m Jenn Phillips-Bacher, the Flickr Foundation’s first-ever Research Fellow. I’ve been a Flickr user since 2007 when my first public photos were taken on a point-and-shoot digital camera. Oh, how the quality of photos have improved since then! It’s an absolute marvel to be able to trawl decades worth of (ever-improving) photography, still, in one place.

Before joining Flickr Foundation, I was most recently a Product Manager at Wellcome Collection, working to make its library and archive collections accessible to as many people as possible. I’ve also recently been a content strategist at the UK’s Government Digital Service where I focussed on tagging and taxonomies to help people find stuff. I’ve also been a web editor, project manager, reference librarian and technology trainer, all within the GLAM (that’s galleries, libraries, archives and museums) world.

My modus operandi for the 20+ years of my career has been to 1) find interesting work to do with kind people and 2) labor for the public good. That’s why I am delighted and honored to be part of Flickr Foundation’s efforts to preserve and sustain our digital heritage.

So what does it mean to be a research fellow?

Given my career history, I’d never considered that I could be a Research Fellow. I used to think research fellowships were reserved for academics (“real” researchers), which I resolutely am not. I’m still figuring out what it does mean to be a research fellow, but here’s where I’ve settled for now: a research fellowship allows me to take time out of normal life for learning and thinking while offering a practical benefit to the Flickr Foundation. That means I’ll use my research skills honed as a librarian and product manager to seek out existing knowledge and expertise, connecting the dots along the way, in order to help shape the Flickr Foundation’s work.

As the fellowship progresses, I’ll write more about what it’s like to move from a digital practitioner role into a Research Fellow role.

My research focus

My research is aimed at the Content Mobility program where I’m specifically interested in how we might design a Data Lifeboat. Not only the logistics of creating a portable archive of any facet of Flickr, but also how to plan for a digital collection’s ‘good ending’. I’ve always been interested in the idea of digital weeding—removing digital collections that no longer serve their purpose, as librarians do with physical materials. As we become more aware of the environmental impact of any digital activity, including online access and long-term preservation, we need to be even more intentional with what we save and what we let go.

As a complementary bit of research, I’ll be digging into the carbon costs of digital collections. I’m curious to see whether there’s something useful to do here that would help the GLAM sector make carbon-conscious digital collection decisions. (If you or anyone you know is already doing this work, I’d love to meet you/them!)

What else? When not working, I can be found nosing around galleries and museums and perambulating around cities in search of human-friendly architecture and good cafes. And like anyone who’s ever lived in Chicago, I have Opinions on hot dogs.

Superdawg drive-in

Photo by jordanfischer, CC BY 2.0.

Welcome to the team, Alex!

I’m very excited to introduce you to Alex Chan, who joined us this week as the Foundation’s first Tech Lead.

We’ve known since Day 1 that we wanted the Flickr Foundation to make things, and not just talk about things. It’s an important way for us to express our mission and long term hope. We know it’s a huge challenge to make Flickr images visible for 100 years, and, while technology is certainly a big factor, meaningful future-proofing of our approaches and tools and documentation will also be key.

That’s why I was so excited when Alex Chan applied for the Tech Lead position. They’ve joined us from Wellcome Collection, where they have led software engineering for digital preservation efforts for several years. We knew we needed an engineer who actually enjoys documentation and creating code that’s clear, tested, and is designed to be re-run by someone else. The code we write today will become the foundational stones of our future approaches, and the Tech Lead must be very focused on that all the time.

Apart from writing great code, Alex is also into noodling about with complex cross-stitching, and we’re already working on a first “toy” we’re hoping to publish very soon, but I’ll leave Alex to tell you about that.

Welcome, Alex!

A millions-of-things pile: Why we need a Collection Development Policy for Flickr Commons

Flickr is a photo-sharing website and has always been about connecting people through photography. It is different from a generic image-hosting service. Flickr Commons, the program launched in 2008 for museums, libraries, and archives to share their photography collections, is different again: it’s about sharing photography collections with a very big audience, and providing tools to help people to contribute information and knowledge about the pictures, ideally to supplement whatever catalogue information already exists.

A collection development policy is a framework for information institutions like libraries, archives and museums to define what they collect, and importantly, what they don’t collect. It’s an important part of maintaining a coherent and valuable collection while trends and technologies change and advance around the organisation. We think it’s time for the Flickr Commons to have a policy like this.

As the Flickr Commons collection grows, we’re seeing all kinds of images in there: photographs, maps, documents, drawings, museum objects, book scans, and more. Therefore, one aspect of the policy is to ask our members to use of Flickr’s “Content-Type” field to improve the way their images can be categorised and found in search. 

Why are we asking Flickr Commons members to categorise their images?

Since the program launched in 2008, the Flickr Commons has grown to also include illustrations, maps, letters, book scans, and other imagery. The default setting for uploads across all accounts is content_type=Photo, so if you don’t alter that default for new uploads, every image is classified as a photo. This starts to break down if you upload, say, the Engrossed Declaration of Independence, or, a wood engraving of Bloodletting Instruments.

One of the largest Flickr Commons accounts is the great and good British Library, which famously published 1 million illustrations into the program in 2013, announcing:

The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of… We are looking for new, inventive ways to navigate, find and display these ‘unseen illustrations’. ”

A million first steps by Ben O’Steen, 12 December 2013

Because the default setting for uploads is content_type=Photos, it meant that every search on Flickr Commons was inundated with “the beige 19th Century.” Those images had, by default, been categorised as Photos, but instead were millions of pictures from 17th, 18th, and 19th-century books. 

Earlier this year, the British Library team adjusted the images in their account to set them as “Illustration/Art” and not Photos. But, that had the effect of “hiding” their content from general, default-set searches. This unintentional hiding raised a little alarm with their followers (who were used to seeing the book scans in their searching), some of whom wrote in to ask what had happened. And rightly so, because it had yet to be explained to them by us or by the search interface.

The Backstory

In any aggregated system of cultural materials, you get colossal variegation. Humans describe things differently, no matter how many professional standards we try to implement. Last year, in 2022, the Flickr Commons was mostly a vast swathe of images from scanned book pages. Not photographs, per se, or things created first as photographs. 

There have been two uploads into Flickr Commons of over one million things. The first one was in 2013, by the British Library, whose intention was to ask the community to help describe the million or so book illustrations they had carefully organised with book structure metadata and described using clever machine tags. The BL team was also careful to avoid annoying the Flickr API spirits by carefully pacing their uploading not to cause any alerts. Since then, they have built a community around the collection for over a decade now, cultivating the creative reuse, inspiration and research in the imagery, primarily through the British Library Labs initiative.

The second gigantic upload, in 2014, was (also) mostly images cropped by a computer program. Created by a solo developer working in a Yahoo Research fellowship, the code was run over an extensive collection of content in Internet Archive (IA) book digitization program to crop out images on scanned book pages. Those were shoved into flickr.com using the API. The developer immediately reached the free account limits, so they negotiated through Yahoo senior management that these millions of images should become part of the Flickr Commons program in an Internet Archive Book Images (IABI) account. Since the developer was also loosely associated with the Internet Archive (IA), IA agreed to be the institutional partner in the Flickr Commons. That’s a requirement of joining the program—that the account is held by an organisation, not an individual. 

These two uploads utterly overwhelmed the smaller Flickr Commons photography collections, even as the two approaches were so different. 

Here’s a graph from April 2022 data that shows all Commons members on the x-axis, and their upload counts on the y-axis.


The IABI account is 5x larger than all the other accounts combined. If you remove the two giants from the data, the average upload per account is just under 3,000 pictures.

These whopper accounts both have billions of views overall. These view counts are unsurprising, given that they completely dominated all search results in Flickr Commons. While the Flickr Commons’ first goal has always been to “increase public access to photography collections”, its secondary—and in my opinion, much more interesting—goal is to “provide a way for the public to contribute information.”

You can see from the two following graphs that a big photo count doesn’t imply deeper engagement. In fact, we’ve seen the opposite is true, and the Flickr Commons members who enjoy the strongest engagement are those who spend time and effort to engage. Drip-feeding content—and not dumping it all at once—will also help viewers to keep up and get a good view of what is being published.


The fifth account in the most-faved data is the fabulous National Library of Ireland, with about 3,000 photos then, which excels at community engagement, demonstrated by its 181,000 faves.


In the comments data, IABI ranks 21st (~3,000), and British Library 27th (~2,000). The top-commented accounts are all in a groove of stellar community engagement.

Employees working in small archives (or large ones, for that matter) simply cannot compete with a content production software program that auto-generates a crop of an image in a book scan and its associated automated many-word metadata. At the Flickr Foundation, we have a place in our hearts for the smaller cultural organisations and want to actively support their online engagements through the Flickr Commons program.

I remember when the IABI account went live. Even though I wasn’t working at Flickr or at the Flickr Foundation at the time, I thought it was a mistake to allow such a vast blast of not-photographs into the Flickr Commons, particularly the second massive collection, mainly because it had been so broadly described, meaning it would turn up content in every search.

Fast forward to last year, in April, when—as my strange first step as Executive Director—I decided in consultation and agreement with the staff at IA to act. We agreed to delete the gargantuan Internet Archive Book Images (IABI) account.

A couple of weeks later, people realised it had happened, and a riot of “Flickr is destroying the public domain” posts popped up. I had not prepared for this reaction, which is the opposite tone I want the Flickr Foundation to set! I’d consulted with the Internet Archive, and a consensus had been reached. But, I was also ignorant of the community enjoying the IABI account—I had presumed there was no community engagement since nobody had logged into the IABI account since just after the giant upload had happened in 2014. That was a mistake, I readily admit, but in my defence, the IA team echoed that same impression when we discussed it. The lone developer (who didn’t work at IA) had uploaded the millions of book images and did not engage with the community. The images were generated from lots of different institutions’ collections digitised through the Internet Archive’s wonderful book scanning initiative. Unfortunately, correct attribution for each institution had not been included in the initial metadata produced for each image. (This was later rectified by a code rewrite by Smithsonian Libraries and Archives, with support from Flickr engineering.) In some cases the content was known to have no copyright—so didn’t fit in the Flickr Commons’ “no known copyright restrictions” assertion and could/should have been declared public domain materials—along with the content_type=Photo declaration, and broad, auto-generated metadata (along with some tagging to group images into their books, for example). In other words, a millions-of-things mess. 

Despite my hesitation, we decided to restore the entire account. This scale of restoration is an incredible engineering feat and an indication of the world-class team working behind the scenes at Flickr. We also set the correct content type designation and adjusted the licences on the restored images to CC0 as Internet Archive does not claim any rights for them. This has the benefit of making them more clearly classified for reuse. 

What we are doing about it

We need to be more restrained when it comes to digital commonses. These huge piles of stuff sound great, but they are not often made with care by people. They’re generated en masse by computers and thrown online. (As a related aside, look to the millions of licensed pieces of content that are mined and inhaled to improve AI programs as their licences are ignored.) 

The British Library acknowledged this, asking for interaction and effort from interested people, and stated explicitly that their 1 million images were “wholly uncurated.” People ultimately enjoyed hunting around in a millions-of-things pile for illustrations of things and made some beautiful responses to them. Indeed, one person managed to add 45,000 tags to the British Library’s Flickr Commons content. 45,000!

Perhaps I’m about to contradict myself again and say this scale of access at a base level was good, at least for computers and computation. But, it wasn’t good inside the Flickr Commons program, and that’s why we need the Collection Development Policy so we can encourage and nurture the seeing, enjoyment and contributions to our shared photographic history we always wanted.

And that’s why we’re drafting the new policy in collaboration with the membership, so we can help Flickr Commons members know how to hold the shape of the container we’ve created instead of bursting it. 

With thanks to Josh Hadro, Martin Kalfatovic, Nora McGregor, Mia Ridge, Alexis Rossi, and Jessamyn West for your time and feedback on this post.

Flickr Commons: About Content Type and Advanced Search

This is a sister post to A millions-of-things pile: Why we need a Collection Development Policy for Flickr Commons. We’re writing this because our new policy changes what turns up in Flickr Commons searches.

Images can be categorised as Photos, Screenshots, Illustration/Art, Virtual Photos, or Videos on Flickr. The default setting for uploads across all accounts is content_type=Photo, so if you don’t alter that default for new uploads, every image is classified as a photo. This starts to break down if you upload, say, the Engrossed Declaration of Independence, or, a wood engraving of Bloodletting Instruments.

Therefore, we’ve launched our new Collection Development Policy to ask Flickr Commons members to classify their images more specifically.

Default search settings

Searching on Flickr defaults to only showing content_type=Photos and Videos. That default means that if one of the Flickr Commons members does change the content type for their uploads, those other types will fall out of the default search results.


This is the default setting: Photos and Videos

We know this can come as a surprise to viewers who were familiar with how things worked before we started asking Flickr Commons members to use the new policy. That surprise isn’t great, so we’re working on addressing it, and working with the flickr.com Customer Support team to get documentation online.

Part of that work is to show how the search works, so you can broaden it to include other content types. To do this, you open up the Advanced Search panel—on the right, under the header search box—and look for the “Content” heading. You can select or remove the different types of content as you wish.


Here you see a different selection: Photos and Illustration/Art

If you want to share around a list of search results that also contain, say, images cropped from page scans of old books (which would now be marked as content type=Illustration/Art), you can see that these settings will show up in the search URLs as parameters if you change them, like this:

https://flickr.com/search/?is_commons=1&text=smile&content_types=0%2C2

Those parameters highlighted in bold tell you the search is filtering for Photos [0] and [%2C] Illustrations/Art [2]. So, as you adjust your content type settings, you can share URLs that will take other people straight there without needing to adapt their Advanced settings.

We know this is a bit fiddly, but your default settings—whether on upload or as you search—should stick if you ever adjust them.