OpenRefine is a powerful data manipulation tool, but it’s also more—it’s a testament to the transformative potential of open source collaboration. Born as a closed-source tool by Metaweb and later adapted for Google and Wikidata, OpenRefine has become a cornerstone for researchers, journalists, and contributors across the globe. From supporting investigative journalism to commemorating history through projects such as the Survey of Scottish Witchcraft database, its impact is vast and varied.
In this interview with Martin Magdinier, Project Manager of OpenRefine, we delve into the challenges, milestones, and exciting developments shaping OpenRefine’s journey and explore how it continues to empower a diverse and growing community of users and contributors.
What motivated the creation of OpenRefine, and how has the project evolved over time? Could you also share notable success stories or high-impact use cases that highlight its journey?
It’s very difficult to know where OpenRefine made a difference because people typically use it locally and don’t need to report anything back to us. But sometimes users report publicly about their use of the tool, for instance in research articles or on social media.
One instance that felt quite significant was the report that OpenRefine had been used as part of an investigative journalism project around the Danske Bank money laundering revelations in 2018. Attendees of the Data Harvest conference told me it had been used in many other investigative journalism projects.
I also really admire the Scottish Witchcraft project for their work commemorating witch-hunting practices. As a byproduct of their work curating their database, they made an awesome OpenRefine tutorial that I have been recommending as a great intro to our Wikidata integration.
How does OpenRefine balance the need for powerful data manipulation tools with user privacy concerns?
A few years ago, we realized it would be good to have a legal entity for the project and started to look for fiscal sponsors. We applied to the Software Freedom Conservancy, and as part of the application process, they had a look at our code base to check for any red flags concerning intellectual property. They found that the tool depended on the org.json Java library, which was released under the “JSON license“. This license is similar to the MIT license but with an additional clause: ”This Software shall be used for Good, not Evil.” Because of this clause, the license is not considered an open source license by various bodies.
It was holding us back to be unable to join the SFC due to this dependency issue. Addressing this required removing the library used for JSON serialization—a significant effort that unfortunately broke compatibility with many plugins.
However, addressing the dependency issue made OpenRefine fully compliant with the open source BSD licenses, a valuable step. Although the SFC declined to sponsor us for other reasons, we joined Code for Science & Society, a welcome conclusion to this bumpy journey.
How do you engage with and support the OpenRefine community?
Our community is quite diverse, both geographically and in their fields of work, so there isn’t one only place where the whole community gathers. Our forum is meant to welcome as broad a spectrum as possible, but it’s in English only so far, and we are aware that not everyone is comfortable with the Discourse platform. Project members are also participating in support and discussions about OpenRefine in other channels: on Discord, Telegram groups, and wikis, at conferences and training events organized by various institutions, and so on. We also run a biyearly user survey (see the 2024 results), which we try to advertise in many of those channels, so that we get responses from as representative a population as possible, but of course gathering opinions is very hard. We have recently added support for in-tool notifications about such surveys, so that we get a better chance to hear the voices of all users.
What recent successes has OpenRefine achieved, and what exciting developments or features can users look forward to in the future?
Participating in Google Summer of Code and Outreachy has been immensely valuable. These programs bring talented interns who tackle meaningful projects and improve the codebase, while also encouraging us to enhance our documentation and onboarding processes. However, retaining contributors beyond the internship period has been challenging, unless their work aligns with funded projects that allow continued involvement.
Can you discuss any collaborations or partnerships that have been particularly beneficial for OpenRefine?
The collaboration with the Wikimedia movement has been transformative for OpenRefine. Initially designed for Freebase, the project lost momentum after its shutdown. Adapting OpenRefine for Wikidata filled a crucial gap in data upload infrastructure, leading to millions of contributions across Wikidata, Wikibase, and Wikimedia Commons. This integration also brought new users to OpenRefine, revitalizing the project.
How does OpenRefine integrate with other open-source projects and platforms?
Contributing to OpenRefine is much easier and more meaningful if you have some experience as a user. While many contributors join for internships or coursework, understanding how the tool is used helps identify meaningful improvements. That’s why our contributor documentation urges potential contributors to learn the basics of OpenRefine’s features before diving into their contributions.
I would also recommend contributors to say hi on our forum and let us know what motivates you to contribute. Sometimes it’s not so easy for others to tell, and it feels to me that it would be easier for us to help you if we understood your overall interest. I have started a small experiment to document my own experiences when contributing to other projects, and I would of course be thrilled to have the same sort of feedback from a new OpenRefine contributor. I am aware of quite a few things that we should improve, but I’d be curious to see which ones come up – and maybe there would even be some I never thought of!
Devi caricare il contenuto da reCAPTCHA per inviare il modulo. Si prega di notare che in questo modo si condividono i dati con provider di terze parti.
Ulteriori informazioniDevi caricare il contenuto da reCAPTCHA per inviare il modulo. Si prega di notare che in questo modo si condividono i dati con provider di terze parti.
Ulteriori informazioni