A millions-of-things pile: Why we need a Collection Development Policy for Flickr Commons

Flickr is a photo-sharing website and has always been about connecting people through photography. It is different from a generic image-hosting service.

A collection development policy is a framework for information institutions like libraries, archives and museums use to define what they collect, and importantly, what they don’t collect. It’s an important part of maintaining a coherent and valuable collection while trends and technologies change and advance around the organisation.

As the Flickr Commons collection grows, we are starting to ask Flickr Commons members to make use of Flickr’s “Content-Type” field to improve the way their images can be categorised and found in search. It’s definitely tricky to impress a collection development policy on a moving vehicle, so please enjoy this background on why we’ve decided to do it.

Why are we asking Flickr Commons members to categorise their images?

Since the program launched in 2008, the Flickr Commons has grown to also include illustrations, maps, letters, book scans, and other imagery. The default setting for uploads across all accounts is content_type=Photo, so if you don’t alter that default for new uploads, every image is classified as a photo. This starts to break down if you upload, say, the Engrossed Declaration of Independence, or, a wood engraving of Bloodletting Instruments.

One of the largest Flickr Commons accounts is the great and good British Library, which famously published 1 million illustrations into the program in 2013, announcing:

The images themselves cover a startling mix of subjects: There are maps, geological diagrams, beautiful illustrations, comical satire, illuminated and decorative letters, colourful illustrations, landscapes, wall-paintings and so much more that even we are not aware of… We are looking for new, inventive ways to navigate, find and display these ‘unseen illustrations’. ”

A million first steps by Ben O’Steen, 12 December 2013

Because the default setting for uploads is content_type=Photos, it meant that every search on Flickr Commons was inundated with “the beige 19th Century.” Those images had, by default, been categorised as Photos, but instead were millions of pictures from 17th, 18th, and 19th-century books. 

We showed the draft Collection Development policy to the team at the British Library, asking for their feedback. We were grateful when they made the switch we had hoped for to categorise their huge account as “Illustration/Art” and not Photos. But, that had the effect of “hiding” their content from general, default-set searches. This unintentional hiding raised a little alarm with their followers (who were used to seeing the book scans in their searching), some of whom wrote to the BL team to ask what had happened. And rightly so, because it had yet to be explained to them by us or by the search interface. 

The Backstory

In any aggregated system of cultural materials, you get colossal variegation. Humans describe things differently, no matter how many professional standards we try to implement. Last year, in 2022, the Flickr Commons was mostly a vast swathe of images from scanned book pages. Not photographs, per se, or things created first as photographs. 

There have been two uploads into Flickr Commons of over one million things. 

  • The first was in 2013, by the British Library, whose intention was to ask the community to help describe the million or so book illustrations they had carefully organised with book structure metadata and described using clever machine tags. The BL team was also careful to avoid annoying the Flickr API spirits by carefully pacing their uploading not to cause any alerts and deliberately set out to build a community around the collection, primarily via the British Library Labs initiative.
  • The second gigantic upload, in 2014, was (also) mostly images cropped by a computer program. Created by a solo developer working in a Yahoo Research fellowship, the code was run over an extensive collection of content in Internet Archive (IA) book digitization program to crop out images on scanned book pages. Those were shoved into flickr.com using the API. The developer immediately reached the free account limits, so they negotiated through Yahoo senior management that these millions of images should become part of the Flickr Commons program in an Internet Archive Book Images (IABI) account. Since the developer was also loosely associated with the Internet Archive (IA), IA agreed to be the institutional partner in the Flickr Commons. That’s a requirement of joining the program—that the account is held by an organisation, not an individual. 

These two uploads utterly overwhelmed the smaller Flickr Commons photography collections, even as the two approaches were so different. 

The IABI account is 5x larger than all the other accounts combined. If you remove the two giants from the data, the average upload per account is just under 3,000 pictures.

These whopper accounts both have billions of views overall. These view counts are unsurprising, given that they completely dominated all search results in Flickr Commons. While the Flickr Commons’ first goal has always been to “increase public access to photography collections”, its secondary—and in my opinion, much more interesting—goal is to “provide a way for the public to contribute information.”

You can see from the two following graphs that a big photo count doesn’t imply deeper engagement. In fact, we’ve seen the opposite is true, and the Flickr Commons members who enjoy the strongest engagement are those who spend time and effort to engage. Drip-feeding content—and not dumping it all at once—will also help viewers to keep up and get a good view of what is being published.

The fifth account in the most-faved data is the fabulous National Library of Ireland, with about 3,000 photos then, which excels at community engagement, demonstrated by its 181,000 faves.

In the comments data, IABI ranks 21st (~3,000), and British Library 27th (~2,000). The top-commented accounts are all in a groove of stellar community engagement.

Employees working in small archives (or large ones, for that matter) simply cannot compete with a content production software program that auto-generates a crop of an image in a book scan and its associated automated many-word metadata. At the Flickr Foundation, we have a place in our hearts for the smaller cultural organisations and want to actively support their online engagements through the Flickr Commons program.

I remember when the IABI account went live. Even though I wasn’t working at Flickr or at the Flickr Foundation at the time, I thought it was a mistake to allow such a vast blast of not-photographs into the Flickr Commons, particularly the second massive collection, mainly because it had been so broadly described, meaning it would turn up content in every search.

Fast forward to last year, in April, when—as my strange first step as Executive Director—I decided in consultation and agreement with the staff at IA to act. We agreed to delete the gargantuan Internet Archive Book Images (IABI) account.

A couple of weeks later, people realised it had happened, and a riot of “Flickr is destroying the public domain” posts popped up. I had not prepared for this reaction, which is the opposite tone I want the Flickr Foundation to set! I’d consulted with the Internet Archive, and a consensus had been reached. But, I was also ignorant of the community enjoying the IABI account—I had presumed there was no community engagement since nobody had logged into the IABI account since just after the giant upload had happened in 2014. That was a mistake, I readily admit, but in my defence, the IA team echoed that same impression when we discussed it. The lone developer (who didn’t work at IA) had uploaded the millions of book images and did not engage with the community. The images were generated from lots of different institutions’ collections digitised through the Internet Archive’s wonderful book scanning initiative. Unfortunately, correct attribution for each institution had not been included in the initial metadata produced for each image. (This was later rectified by a code rewrite by Smithsonian Libraries and Archives, with support from Flickr engineering.) In some cases the content was known to have no copyright—so didn’t fit in the Flickr Commons’ “no known copyright restrictions” assertion and could/should have been declared public domain materials—along with the content_type=Photo declaration, and broad, auto-generated metadata (along with some tagging to group images into their books, for example). In other words, a millions-of-things mess. 

Despite my hesitation, we decided to restore the entire account. This scale of restoration is an incredible engineering feat and an indication of the world-class team working behind the scenes at Flickr. We also set the correct content type designation and adjusted the licences on the restored images to CC0 as Internet Archive does not claim any rights for them. This has the benefit of making them more clearly classified for reuse. 

What we are doing about it

We need to be more restrained when it comes to digital commonses. These huge piles of stuff sound great, but they are not often made with care by people. They’re generated en masse by computers and thrown online. (As a related aside, look to the millions of licensed pieces of content that are mined and inhaled to improve AI programs as their licences are ignored.) 

The British Library acknowledged this, asking for interaction and effort from interested people, and stated explicitly that their 1 million images were “wholly uncurated.” People ultimately enjoyed hunting around in a millions-of-things pile for illustrations of things and made some beautiful responses to them. Indeed, one person managed to add 45,000 tags to the British Library’s Flickr Commons content. 45,000!

Perhaps I’m about to contradict myself again and say this scale of access at a base level was good, at least for computers and computation. But, it wasn’t good inside the Flickr Commons program, and that’s why we need the Collection Development Policy so we can encourage and nurture the seeing, enjoyment and contributions to our shared photographic history we always wanted.

And that’s why we’re drafting the new policy in collaboration with the membership, so we can help Flickr Commons members know how to hold the shape of the container we’ve created instead of bursting it. 

With thanks to Josh Hadro, Martin Kalfatovic, Nora McGregor, Mia Ridge, Alexis Rossi, and Jessamyn West for your time and feedback on this post.

Flickr Commons: About Content Type and Advanced Search

This is a sister post to A millions-of-things pile: Why we need a Collection Development Policy for Flickr Commons. We’re writing this because our new policy changes what turns up in Flickr Commons searches.

Images can be categorised as Photos, Screenshots, Illustration/Art, Virtual Photos, or Videos on Flickr. The default setting for uploads across all accounts is content_type=Photo, so if you don’t alter that default for new uploads, every image is classified as a photo. This starts to break down if you upload, say, the Engrossed Declaration of Independence, or, a wood engraving of Bloodletting Instruments.

Therefore, we’ve launched our new Collection Development Policy to ask Flickr Commons members to classify their images more specifically.

Default search settings

Searching on Flickr defaults to only showing content_type=Photos and Videos. That default means that if one of the Flickr Commons members does change the content type for their uploads, those other types will fall out of the default search results.

This is the default setting: Photos and Videos

We know this can come as a surprise to viewers who were familiar with how things worked before we started asking Flickr Commons members to use the new policy. That surprise isn’t great, so we’re working on addressing it, and working with the flickr.com Customer Support team to get documentation online.

Part of that work is to show how the search works, so you can broaden it to include other content types. To do this, you open up the Advanced Search panel—on the right, under the header search box—and look for the “Content” heading. You can select or remove the different types of content as you wish.

Here you see a different selection: Photos and Illustration/Art

If you want to share around a list of search results that also contain, say, images cropped from page scans of old books (which would now be marked as content type=Illustration/Art), you can see that these settings will show up in the search URLs as parameters if you change them, like this:


Those parameters highlighted in bold tell you the search is filtering for Photos [0] and [%2C] Illustrations/Art [2]. So, as you adjust your content type settings, you can share URLs that will take other people straight there without needing to adapt their Advanced settings.

We know this is a bit fiddly, but your default settings—whether on upload or as you search—should stick if you ever adjust them.

British Library & Flickr Commons: The many hands (and some machines) making light work

By Nora McGregor, Digital Curator in the Digital Scholarship Department of the British Library

Over a recent cup of coffee, George Oates, the indefatigable founder of Flickr Commons and now Executive Director of the Flickr Foundation, asked me if any memorable moments stood out during our long relationship with the Commons since British Library first joined nearly a decade ago. Of course a multitude of inspired engagements instantly filled my mind like some exploding word cloud and I could’ve easily prattled on until our cups dried up and the shop shutters went down. But one emerges from all the rest for me as the most shining example of all and that is what we’ve come to call “The tale of Chico vs the Machine”.


British Library digitised image from page 57 of "A Strange Elopement. ... Illustrations by W. H. Overend"
British Library digitised image from page 57 of “A Strange Elopement. … Illustrations by W. H. Overend” | Flickr


Our Flickr Commons story began in 2013 when we were looking for inventive ways to improve the discoverability of a new and exceedingly eclectic collection of 19th century illustrations we’d recently collated. Plucked from the pages of our digitised books by an algorithm built by Ben O’Steen in British Library Labs, this unique and sizable image collection was largely untagged and undescribed. Each image had associated with it only the title of the book and page it came from, but no other details to describe it, such as what the image itself depicted. We needed a curious, smart, engaged, and global audience to set their eyes and collective expertise on it, to help us tag and describe them so we could create meaningful subcollections and improve searchability. We also needed a powerful API to enable working with such a large collection, and the millions of interactions it may potentially garner, at scale. We happily found both in the Flickr Commons. 

In late 2014, we had been chatting with artist Mario Klingemann aka Quasimondo who had happened upon this wild, wonderful and wholly uncurated collection of ours in Flickr Commons and was keen to create a series of artworks using the images. As part of his craft he was mixing automatic image classification with manual confirmation to identify and tag tens of thousands of the images – ranging from maps to ships, portraits to stones – to discover more from within the collection, and in turn, make them more discoverable for others.


16 x 16 Colourful Faces from the British Library Collection

16 x 16 Colourful Faces from the British Library Collectio… | Flickr
By Mario Klingemann

The result of data mining the British Library Commons Collection, identifying colorful plates using some image analysis and subsequently using face detection to extract the faces contained therein.


As we were running some statistics around the algorithmically generated tags Mario was creating and adding back to individual images for us via the Flickr API (something in the region of 30,000 at that point if I recall), we noticed that yet another user had already contributed something in the region of 45,000 tags to the collections. Assuming this user was similarly a dabhand with an image classification algorithm, we were absolutely gobsmacked to discover that, at closer inspection, no, actually, these contributions were all added by hand! Not only were these invaluable image tags being manually contributed by one person, but they were expertly and thoughtfully individually crafted. They did not simply identify general objects or themes in each image like “ship”, which in itself was of incalculable value for improving search, particularly when no such simple descriptions existed at all. These tags were of a rare and profound quality. To illustrate, for 19th century biblical images, the user, only known to us by his handle, had added specific biblical passage numbers for which the scene depicted referred to!


British Library digitised image from page 394 of "The eventful voyage of H.M. Discovery Ship 'Resolute' to the Arctic Regions in search of Sir J. Franklin. ... To which is added an account of her being fallen in with by an American Whaler after her abando

British Library digitised image from page 394 of “The eventful voyage of H.M. Discovery Ship ‘Resolute’ to the Arctic Regions in search of Sir J. Franklin. … To which is added an account of her being fallen in with by an American Whaler after her abandonment … and of her presentation to Queen Victoria by the Government of the United States” | Flickr


The sheer scale, quality and value of this singular Flickr user’s personal contribution was so staggering, we immediately sought them out to personally thank them and to ask if we could recognise their work publicly through our BL Labs Award programme, at the very least. And yet, more surprises were to come. When we approached them with our gratitude and our offer of recognition we were very politely rebuffed! They shared with us that as they had been bedbound, it was they who wanted to express their gratitude for the opportunity to remain active in the world in some meaningful way. They told us that days spent trawling through and tagging such a wild and unruly collection, in the knowledge that they’re helping others to find these same gems, was reward enough and I can tell you, it was a response that no one in our team will ever forget. We attempted a few more times to shower them with accolades in some agreeable way but every time our overtures were politely declined on the same grounds.

This memory makes my heart swell and it’s a tale that so perfectly encapsulates the variety of valuable interactions –from the very intimate and human, to the technologically innovative and computationally driven – that the Flickr Commons community and platform has supported.

To give just one example, since 2015, 50,000 maps have been found and tagged by humans, and machines working alongside each other individually or as part of community events. They’ve all been georeferenced and are now being added back into the British Library catalogue as individual collection items in their own right – bringing direct benefits to current and future users of our historical image collections as more wonderful images are surfaced.

Screenshot of an old map on a newer map

Explore the georeferencer or the British Library’s Flickr Albums.

Every tag contributed, whether expertly crafted by human hand, or machine learned by an algorithm, has helped to make thousands, if not millions of unseen historical images from British Library collections more discoverable and we simply could not have gotten this far in curating this massive and wonderful collection without the Flickr Commons. 

By Nora McGregor, Digital Curator in the Digital Scholarship Department of the British Library