Introducing Flickypedia, our first tool

Building a new bridge between Flickr and Wikimedia Commons

For the past four months, we’ve been working with the Culture & Heritage team at the Wikimedia Foundation — the non-profit that operates Wikipedia, Wikimedia Commons, and other Wikimedia free knowledge projects — to build Flickypedia, a new tool for bridging the gap between photos on Flickr and files on Wikimedia Commons. Wikimedia Commons is a free-to-use library of illustrations, photos, drawings, videos, and music. By contributing their photos to Wikimedia Commons, Flickr photographers help to illustrate Wikipedia, a free, collaborative encyclopedia written in over 300 languages. More than 1.7 billion unique devices visit Wikimedia projects every month.

We demoed the initial version at GLAM Wiki 2023 in Uruguay, and now that we’ve incorporated some useful feedback from the Wikimedia community, we’re ready to launch it. Flickypedia is now available at https://www.flickr.org/tools/flickypedia/, and we’re really pleased with the result. Our goal was to create higher quality records on Wikimedia Commons, with better connected data and descriptive information, and to make it easier for Flickr photographers to see how their photos are being used.

This project has achieved our original goals – and a couple of new ones we discovered along the way.

So what is Flickypedia?

An easy way to copy photos from Flickr to Wikimedia Commons

The original vision of Flickypedia was a new tool for copying photos from Flickr to Wikimedia Commons, a re-envisioning of the popular Flickr2Commons tool, which copied around 5.4M photos.

This new upload tool is what we built first, leveraging ideas from Flinumeratr, a toy we built for exploring Flickr photos. You start by entering a Flickr URL:

And then Flickypedia will find all photos at that URL, and show you the ones which are suitable for copying to Wikimedia Commons. You can choose which photos you want to upload:

Then you enter a title, a short description, and any categories you want to add to the photo(s):

Then you click “Upload”, and the photo(s) are copied to Wikimedia Commons. Once it’s done, you can leave a comment on the original Flickr photo, so the photographer can see the photo in its new home:

As well as the title and caption written by the uploader, we automatically populate a series of machine-readable metadata fields (“Structured Data on Commons” or “SDC”) based on the Flickr information – the original photographer, date taken, a link to the original, and so on. You can see the exact list of fields in our data modeling document. This should make it easier for Commons users to find the photos they need, and maintain the link to the original photo on Flickr.

This flow has a little more friction than some other Flickr uploading tools, which is by design. We want to enable high-quality descriptions and metadata for carefully selected photos; not just bulk copying for the sake of copying. Our goal is to get high quality photos on Wikimedia Commons, with rich metadata which enables them to be discovered and used – and that’s what Flickypedia enables.

Reducing risk and responsible licensing

Flickr photographers can choose from a variety of licenses, and only some of them can be used on Wikimedia Commons: CC0, Public Domain, CC BY and CC BY-SA. If it’s any other license, the photo shouldn’t be on Wikimedia Commons, according to its licensing policy.

As we were building the Flickypedia uploader, we took the opportunity to emphasize the need for responsible licensing – when you select your photographs, it checks the licenses, and doesn’t allow you to copy anything that doesn’t have a Commons-compatible license:

This helps to reduce risk for everyone involved with Flickr and Wikimedia Commons.

Better duplicate detection

When we looked at the feedback on existing Flickr upload tools, there was one bit of overwhelming feedback: people want better duplicate detection. There are already over 11 million Flickr photos on Wikimedia Commons, and if a photo has already been copied, it doesn’t need to be copied again.

Wikimedia Commons already has some duplicate detection. It’ll spot if you upload a byte-for-byte identical file, but it can’t detect duplicates if the photo has been subtly altered – say, converted to a different file format, or a small border cropped out.

It turns out that there’s no easy way to find out if a given Flickr photo is in Wikimedia Commons. Although most Flickr upload tools will embed that metadata somewhere, they’re not consistent about it. We found at least four ways to spot possible duplicates:

  • You could look for a Flickr URL in the structured data (the machine-readable metadata)
  • You could look for a Flickr URL in the Wikitext (the human-readable description)
  • You could look for a Flickr ID in the filename
  • Or Flickypedia could know that it had already uploaded the photo

And even looking for matching Flickr URLs can be difficult, because there are so many forms of Flickr URLs – here are just some of the varieties of Flickr URLs we found in the existing Wikimedia Commons data:

(And this is without some of the smaller variations, like trailing slashes and http/https.)

We’d already built a Flickr URL parser as part of Flinumeratr, so we were able to write code to recognise these URLs – but it’s a fairly complex component, and that only benefits Flickypedia. We wanted to make it easier for everyone.

So we did!

We proposed (and got accepted) a new Flickr Photo ID property. This is a new field in the machine-readable structured data, which can contain the numeric ID. This is a clean, unambiguous pointer to the original photo, and dramatically simplifies the process of looking for existing Flickr photos.

When Flickypedia uploads a new photo to Flickr, it adds this new property. This should make it easier for other tools to find Flickr photos uploaded with Flickypedia, and skip re-uploading them.

Backfillr Bot: Making Flickr metadata better for all Flickr photos on Commons

That’s great for new photos uploaded with Flickypedia – but what about photos uploaded with other tools, tools that don’t use this field? What about the 10M+ Flickr photos already on Wikimedia Commons? How do we find them?

To fix this problem, we created a new Wikimedia Commons bot: Flickypedia Backfillr Bot. It goes back and fills in structured data on Flickr photos on Commons, including the Flickr Photo ID property. It uses our URL parser to identify all the different forms of Flickr URLs.

This bot is still in a preliminary stage—waiting for approval from the Wikimedia Commons community—but once granted, we’ll be able to improve the metadata for every Flickr photo on Wikimedia Commons. And in addition, create a hook that other tools can use – either to fill in more metadata, or search for Flickr photos.

Sydney Harbour Bridge, from the Museums of History New South Wales. No known copyright restrictions.

Flickypedia started as a tool for copying photos from Flickr to Wikimedia Commons. From the very start, we had ideas about creating stronger links between the two – the “say thanks” feature, where uploaders could leave a comment for the original Flickr photographer – but that was only for new photos.

Along the way, we realized we could build a proper two-way bridge, and strengthen the connection between all Flickr photos on Wikimedia Commons, not just those uploaded with Flickypedia.

We think this ability to follow a photo around the web is really important – to see where it’s come from, and to see where it’s going. A Flickr photo isn’t just an image, it comes with a social context and history, and being uploaded to Wikimedia Commons is the next step in its journey. You can’t separate an image from its context.

As we start to focus on Data Lifeboat, we’ll spend even more time looking at how to preserve the history of a photo – and Flickypedia has given us plenty to think about.

If you want to use Flickypedia to upload some photos to Wikimedia Commons, visit www.flickr.org/tools/flickypedia.

If you want to look at the source code, go to github.com/Flickr-Foundation/flickypedia.

Data Lifeboat: NEH Grant Update 1

By Ewa Spohn

And we’re off! Thanks to the Digital Humanities Advancement Grant we were awarded by the National Endowment for the Humanities, our work on the Data Lifeboat has started, in our Content Mobility program. We’ll be posting an update for you each month. 

hand-drawn sketch of a decentralization methodology, Feb 2nd, 2024

Excellent Lifeboat-related game and book brought in by Alex for our Kick-off

What’s a Data Lifeboat?

A quick recap for those not familiar with the concept, from our grant narrative:

A Data Lifeboat is an archival piece of Flickr, not all of the 50 billion images and their metadata. For example, a Lifeboat might contain all the photos tagged with “sunflower” or all the Recipes to Share group submissions. Whatever facet of the data you can think of, you could generate a Data Lifeboat for it. We envision an archival sliver richer than a mere folder of JPGs: one where you can navigate the content to explore and understand its networked context. Even better, an archival sliver that is updated if things change at flickr.com. Our goals with this project are to create several rough prototypes of the software, develop a reasonably detailed understanding of the main technical challenges, prepare a survey of critical ongoing legal issues, and establish a robust design direction for further product development.”

This idea was born from two challenges: 1) Flickr contains a multitude of shared histories, and is owned and controlled by a corporation which could decide to close the service, which—as we’ve seen in the past—can result in the destruction of cultural heritage, and 2) the Flickr archive is huge and, in its current form, impossible for any one archival institution to take on.

Flickr’s 50 billion or so photos reflect our diverse heritage, traditions, and history back to us in a unique way. The collection is also born digital, a massive advantage over conventional archives, because the photographers usually describe their pictures themselves as they publish. The pictures are also enriched by the network of social activity that surrounds them, which – again – is unique to the Flickrverse. Finally, this kind of volume is astonishing: Flickr and other platforms like it are orders of magnitude larger than our biggest cultural collections to date.

At the Foundation, we believe we must begin to treat this collection as we would our ‘traditional’ great libraries, archives, and collections. Time is of the essence: the commercial platforms that host these kinds of huge collections can (and do) disappear, effectively sinking our heritage along with them. Our hope is that a Data Lifeboat will carry Flickr images away from the possibility of a sinking ship unscathed. Our future plans include developing the idea of a “dock” in a “safe harbour” – somewhere specific for the Data Lifeboat to land and be preserved.

The scope of the grant

We’re using the NEH grant to create two identical prototype Data Lifeboats containing a selection of the Flickr Commons. This will (hopefully) be a richer archival format that allows for the exploration of content within its networked context, and one that can be updated when changes are introduced in Flickr.com. Importantly, we want to place these two Lifeboats in two different places, a proxy for our developing goal of “safe harbours” for them.

This phase of the project, making a demonstrable prototype, or prototypes, is scheduled to end mid-year.

Our crew

It’s an exciting and completely new thing, and working on it is a multidisciplinary team drawn from both the Flickr Foundation and our Flickr Commons members and advisors:

  • George Oates, Project Director, who provides strategic and design input, and financial oversight
  • Alex Chan, Tech Lead, who is developing the core of our prototypes
  • Jessamyn West, Community Manager, who leads our communication with the Flickr Commons collaborators, the project advisors, as well as broader audiences
  • Ewa Spohn, Project Manager, who ensures the team sticks to the plan. And budget

We’re excited to engage some of our Flickr Commons members directly for the first time, too. The Flickr Foundation team will be joined by staff from three of our member institutions:

And finally, our advisors, who bring a wide range of experience and knowledge and will help us make sure we build stuff for the long term:

Kick-off? Done.

We’re about to have our first all-hands meeting, although in late January we took advantage of Dietrich’s short visit to London to hold our first face-to-face workshop. 

Jenn, Ewa, George, Alex, Stef, and Dietrich (the photographer) at our kick-off

Jenn, Ewa, George, Alex, Stef, and Dietrich (the photographer) at our kick-off at HQ

Over coffee and sugary snacks, we spent two days exploring decentralized storage and how it could be applied to archiving digital content, and thinking through a possible schema for the data in a data lifeboat. 

Emerging questions

Many, many (#many) questions arose (for which we currently have no answers), for example:

  • Is a Data Lifeboat launched in response to an emergency or as part of regular housekeeping?
  • What must a Data Lifeboat contain? Could it initially just be a manifest and the images (which are large and expensive to process) are added later?
  • Who decides what is in (and out) of a Data Lifeboat and to what extent should it feel like an active selection?
  • Where are the ‘edges’ of the network surrounding a Flickr photo, and what is a holistic archive?
  • What existing digital asset management formats could (should) a Data Lifeboat be consistent with for it to be ‘docked’ successfully?

We were also very pleasantly surprised by the power of our lifeboat metaphor and how far we could stretch it to help coordinate our thinking! And thanks again to Dietrich for sharing time with us to crack the project open.

Next steps

Next up is our first all-hands meeting to bring the whole project team up to speed with the plan. That’ll be followed by a deep dive review of digital asset management systems in cultural institutions and a survey of the legal rocks that a Data Lifeboat may encounter. We think that will give us enough to allow us to define some high level requirements for the prototypes so that the development proper can start towards the end of the month.

Somewhere among all that we’re also planning a team expedition to a lifeboat museum to learn more about how lifeboats work in the physical world, but more about that in another blog post…

 

This work is supported by the National Endowment for the Humanities.

NEH logo

Introducing flinumeratr, our first toy

by Alex

Today we’re pleased to release Flinumeratr, our first toy. You enter a Flickr URL, and it shows you a list of photos that you’d see at that URL:

This is the first engineering step towards what we’ll be building for the rest of this quarter: Flickypedia, a new tool for copying Creative Commons-licensed photos from Flickr to Wikimedia Commons.

As part of Flickypedia, we want to make it easy to select photos from Flickr that are suitable for Wikimedia Commons. You enter a Flickr URL, and Flickypedia will work out what photos are available. This “Flickr URL enumerator”, or “Flinumeratr”, is a proof-of-concept of that idea. It knows how to recognise a variety of URL types, including individual photos, albums, galleries, and a member’s photostream.

We call it a “toy” quite deliberately – it’s a quick thing, not a full-featured app. Keeping it small means we can experiment, try things quickly, and learn a lot in a short amount of time. We’ll build more toys as we have more ideas. Some of those ideas will be reused in bigger projects, and others will be dropped.

Flinumeratr is a playground for an idea for Flickypedia, but it’s also been a context for starting to develop our approach to software development. We’ve been able to move quickly – this is only my fourth day! – but starting a brand new project is always the easy bit. Maintaining that pace is the hard part.

We’re all learning how to work together, I’m dusting off my knowledge of the Flickr API, and we’re establishing some basic coding practices. Things like a test suite, documentation, checks on pull requests, and other guard rails that will help us keep moving. Setting those up now will be much easier than trying to retrofit them later. There’s plenty more we have to decide, but we’re off to a good start.

Under the hood, Flinumeratr is a Python web app written in Flask. We’re calling the Flickr API with the httpx library, and testing everything with pytest and vcrpy. The latter in particular has been so helpful – it “records” interactions with the Flickr API so I can replay them later in our test suite. If you’d like to see more, all our source code is on GitHub.

You can try Flinumeratr at https://flinumeratr.glitch.me. Please let us know what you think!