Episode 2: transcript

AT: We know that a lot of the technological stack of AI systems is open. It’s based — funded on open code. That doesn’t solve any of the problems of the black boxes we discussed of possible harms. I think we need to take the spirit of open source, of openness, but really look for some new solutions.”


[00:00:22] SM: Welcome to Deep Dive: AI, podcast from the Open Source Initiative. We’ll be exploring how artificial intelligence impacts free and open-source software, from developers to businesses, to the rest of us.


[00:00:37] SM: Deep Dive: AI is supported by our sponsor, GitHub. Open-source AI frameworks and models will drive transformational impact into the next era of software; evolving every industry, democratizing knowledge, and lowering barriers to becoming a developer. 

As this evolution continues, GitHub is excited to engage and support OSI’s deep dive into AI and open source and welcomes everyone to contribute to the conversation.

No sponsor had any right or opportunity to approve or disapprove the content of this podcast.


[00:01:08] SM: Welcome, everyone. Today we are meeting with Alek Tarkowski, Director of Strategy at the Open Future Foundation, a European think tank for the open movement. He is a sociologist an activist and a strategist, active for a long time in social movements. He is also on the board at Creative Commons. Welcome, Alek. Thank you for giving time to us.

[00:01:30] AT: Hello, Stefano. Thank you for the invitation.

[00:01:33] SM: Let’s start talking about the artificial intelligence, how it’s affecting common life. How are these applications that you see being deployed into society? How is that affecting real-life people?

[00:01:46] AT: We live certainly at an interesting time, a time of technological change, which probably has been the case for as long as I’m an adult, for 30 years, but there is a sense of there’s something new, right, that these so-called AI technologies are really different from the previous waves of internet technologies. 

Now, I think the interesting thing is that you said we can see them. Actually, I think the trick is that in many cases, we do not see them. There’s a lot of also confusion, what is in AI technology, and what — and there’s some also confusion between what is already happening and what we’re expecting to happen, which I think is typical of these emergent technologies. They function somewhere between fiction, prototype, deployment, and mainstream, right? That’s the curve they’re on. 

[00:02:32] SM: Some magic here and there. 

[00:02:33] AT: There’s a lot of magic and a lot of people like to sprinkle a lot more magic than there is. I also like a term that’s sometimes used alternatively to artificial intelligence, which is automated decision-making. It sounds a bit technical, it of course, means something different. I think it’s a nice term to deploy. Basically, automated decision-making says there are situations where humans no longer decide about you, that there are some systems that do that. In this situation, let’s say were traditionally some bureaucrat in the city would decide maybe which school your child will go to or whether you are eligible for some social support. More and more often, this will be done by automated systems, of which artificial intelligence systems are one specific category.

[00:03:17] SM: Yeah. It’s really fantastic how that term conveys very clearly what are we talking about. We’re talking about AI system that is making decisions for you. It’s really clear, rather than the magic.

[00:03:29] AT: Yeah. I like it, because it also connects this conversation to the futuristic conversation or the on AI with issues were already aware of. There are many situations where you don’t need advanced technologies, but still decisions are made, not by humans. There are some cases in Poland where for instance, decisions about providing unemployment benefits was made by what was dubbed the automated system. There was even a big investigation made by the Panopticon Foundation. They discovered that the system is actually an algorithm that can be described in a spreadsheet.

It was a really super simple system. It gathered input in the shape of 10 questions. The public official asks the person applying for the benefits and turned out some very basic functions to say yes or no. The question was, so is this okay, because there wasn’t really any AI hiding inside? There was suspicion that there is, but it wasn’t confirmed. I think the answer was, yes, it’s still an issue. More importantly, more and more often, these systems are actually AI-powered, right? There is some component of machine learning happening inside. We should probably expect that in the coming years, there’ll be more and more such systems.

[00:04:40] SM: Yeah. You introduce a very important piece there where you discover that the system in Poland for unemployment actually had a very simple algorithm that could inspect and was easy to understand and maybe even fix to spot unfairness or mistakes with proper AI systems that are really not available. It’s one of those obscurities that inside neural networks especially, are hard to diagnose or hard to investigate in order to predict their outcomes, too. In this case, it’s we have real life being impacted by automated decision-making. How are regulators approaching this issue? Do they know? Are they noticing?

[00:05:23] AT: At least in Europe, they do. I think it’s a good time to do it, because I believe that these AI systems are not yet deployed on mass. You make a valid point. That’s the big challenge with them. The issues are the same as with all the automation, but the ways of addressing them are a lot more harder, because there’s this, I think, beautiful symbol of the black box, right? That hides things inside. That’s exactly the case with their system. Their complexity basically makes them much harder to analyze, assess their impact, and so on. 

I participated in a study done by AlgorithmWatch, a German Foundation, called Automating Society, which looked at cases of the deployment of such AI systems in Europe. To be honest, there are too many, yet at least ones that are publicly known. Here, a caveat is probably needed. You don’t find them if you look, let’s say in cities, in small companies, in a national government, but obviously, then there are these huge platforms, which we know are extensively and more and more employing machine learning mechanisms, but in a way that is not really clear. We all use search engines that by now are so almost certainly AI-powered. 

We use social networks that filter content, most probably with AI technologies. Again, the black box appears, there’s no certainty. It’s a bit of a weird moment, you can both say, “No, I don’t really see these technologies around me.” You can probably be just as correct in saying, “Hey, they’re everywhere. Any app you choose in your phone, they might be there.”

[00:06:56] SM: Right, exactly. I mean, from routing and orienting through the streets of an unknown city calling a cab. I’ve always wondered how much of my decisions to walk through a street or drive through a street were driven by advertisers at some point. The doubts remain because we really don’t know from the outside. There is no label applied to this application to say this is being influenced by this and that. 

So you mentioned that the European Union seems to be extremely active on the AI front and they published the AI Act. Can you give us a little bit of an overview of what that is or what stage it is?

[00:07:34] AT: The AI Act proposal was published last year, as one in a series of several regulatory measures which is really a big European regulatory push on digital, right? We have the recently adopted Digital Markets Act and Digital Services Act, which regulate platforms. We have a whole package of data governance mechanism, which by the way, I think do connect with AI conversations. Then we have the AI Act. I mentioned this study Automating Society, which showed that there aren’t that many systems yet deployed, because I think it shows this is exactly the right moment to have a conversation about AI regulation. 

I’m happy it’s happening right now because, as we know, policies are deployed more slowly than technology. You need to give them time. I think Europe by passing these laws is giving itself time. The question is, of course, what kind of regulation? Is it a good regulation? Let’s maybe briefly go over the document what it proposes basically, I think the key category, there is risk. Really Europe has been developing an approach they call Trustworthy AI. Then this approach, the biggest question for AI to be trustworthy is whether it is risky or not. The regulation doesn’t really cover all cases of AI. It really focuses on two issues. What kinds of AI uses or technologies are so dangerous that they should be outright banned? What technologies or context in which they are used are high risk? So are risky to an extent that you really need to regulate their use. 

This is basically what this Act tries to do. In terms of the users that are banned. It’s a very short list. It includes subliminal distortion which is a bit almost science fiction sounding category, but really, and this is interesting with the regulator believes there are uses of AI that can subliminally affect humans and this should be banned. But then the more realistic ones are banned on social scoring and the mechanisms that usually are described as being deployed in China where as a citizen, you get scored on how good the person you are. These are meant to be banned in Europe.

Banned on real time biometric identification, so o basically technologies that take data from all the cameras deployed in urban spaces and that lets you identify people in the real-time. These are meant to be banned, although there are some carve-outs basically for public security. Then the last category is technologies that exploit vulnerabilities. So find some ways maybe to make elderly people do something against their wish, making use of the fact that they don’t understand technology. These really high-level risks. You can argue whether it’s a good list or not. I am happy that Europe is thinking about banning for instance, social scoring.

[00:10:29] SM: That’s very interesting. I wonder how it will be classified if you have a five stars as a customer. 

[00:10:35] AT: Then you get discounted if that would fall into that category or if it’s just the scoring for the government that is prohibited. I understand that it’s mainly for the government and also this idea that you’re starting to combine sources. This is I think, the big risk that you have some five-star rating, how well you party in the club, which I could imagine happening. Then that data is shared with your employer and further on goes on to determine your, I don’t know, health portfolio, right? All these scenarios, I think, that the dangerous one, but really, when you say social scoring the problem is that then someone really gives you a score. There are some information that for instance, in China, they’re considering that the score could determine whether you get a passport for instance, right? Some basic rights are limited.

[00:11:23] SM: Yeah. I’ve seen similar experiments also starting in some cities in Italy, where they were scoring citizens about how well they were behaving in certain settings, so that they could get discounts on taxes like trash, disposal, and things like that. I don’t know how much of the AI was involved there, but in any case, very scary propositions.

[00:11:45] AT: I think the broader category is that of the high-risk situations in here, some things they’re considering our use of AI in the context of employment. All kinds of work-related HR decisions, uses in education and related to vocational issues. Scoring of students, how well they study, trying to determine who they will be uses in law enforcement, such as these ideas of predictive policy. There are some sorts of high-profile cases from the US where they attempted to use previous data on crimes and arrests to determine who will commit the crime again. 

People seem to love this scenario because they come out of science fiction movies, but they really are rife with risks that affect basically basic rights of citizens, right? If a system deems you guilty, basically, before you commit a crime, I think that’s really serious. Another area that is high risk is migration. I live in Poland, where we just have the huge wave of refugees from Ukraine. I think this suddenly becomes very relevant. These are people, again, who are very vulnerable and their ideas how to deploy systems that can really be, basically inhuman. The last category is justice and democracy, again a very fundamental issue that you shouldn’t toy with democracy using AI technologies.

[00:13:07] SM: This is one part of the AI Act. It’s about the technologies that are too dangerous.

[00:13:12] AT: The whole question is, so what regulation do you introduce? The proposal comes with a list of some measures that includes basically if you want it to summarize it three things, impact assessments and monitoring of deployment of these systems, that they’re not left alone, that someone asks questions, what will happen if I introduce in a school system these technologies? Second thing is transparency. You mentioned labels. Am I aware that there’s an AI system there? Am I aware of its decisions? Can I be maybe told on what basis it made the decision? 

The third category is human oversight. In what situations for instance, you might be able to ask, hello, I would this decision to be reviewed by a human, right? Instead of a machine. It of course, gets more complicated, but basically Europe is thinking that when a use of AI can be called high risk, these sorts of regulations start to apply. Of course, the huge debate that started immediately and has been running for the last year is whether these measures are sufficient. There’s of course, a group of people who say no, we need much harder protections of persons, basic rights. Of course, there’s another column that says these measures were curb innovation too much. There’s a now quite intense policy debate happening on this.

[00:14:32] SM: Who’s participating in these debates? What kind of people? What kind of groups do you feel like in Europe are influencing the conversation? 

[00:14:40] AT: In a way it’s almost a cliché like in most policy debates. You have the industry and the activists. These are two really strongest forces and I on purpose say activist, because I think the challenge with this regulation, like with many digital policies that is very hard to involve everyday people. Europe actually did a very interesting project last year called conference on the future of Europe, where it really gave voice to citizens through different means. You could submit proposals online, which then all were taken into account in the so-called Citizen assembly. 

These situations where they really selected random Europeans served the function a bit like members of the Parliament. Of course, they didn’t pass laws, but hey agreed on the set of recommendations that were sent to the European Commission that the Commission promised at least to look at them. What happened there, which I think is very telling is that people were given really a broad range of issues. You’d be happy to hear that there was a lot of proposals on open-source policies that was somehow very strong, but ultimately, the outcome showed that average European really gets the message about privacy. There was an ask for privacy to be respected. 

Still the basic issue of providing access to internet and technology is for people important, but other than that, they didn’t mention any of the things are being regulated. They don’t mention platforms. They don’t mention AI. They don’t mention data. Why? I think it’s just too complex. This role is played by civil society activists, who are mainly digital rights activists. The policies they focus on are those that will protect basic rights, protect citizens. I think they are a very strong voice in the debate. 

Obviously, industry is the other voice, which has a very, I would say, by now obvious line. Usually they say regulation is bad. I don’t think anyone in Europe now will say that no regulation of AI is good. I think the sorts of these extremely risky scenarios, there’s agreement, they should be banned. But then the industry is very quickly ready to say that some of the measures around transparency, around disclosure of how it works is just going to be challenging for innovative business in Europe.


[00:16:54] SM: Deep Dive: AI is supported by our sponsor, DataStax. DataStax is the real-time data company. With DataStax, any enterprise can mobilize real-time data and quickly build the smart, highly-scalable applications required to become a data-driven business and unlock the full potential of AI. With AstraDB and Astra streaming, DataStax uniquely delivers the power of Apache Cassandra, the world’s most scalable database, with the advanced Apache pulsar streaming technology in an open data stack available on any cloud.

DataStax leaves the open-source cycle of innovation every day in an emerging AI everywhere future. Learn more at datastax.com.

No sponsor had any right or opportunity to approve or disapprove the content of this podcast.


[00:17:38] SM: It looks like Europe is really taking different standards from the United States where no such regulations seem to be on the horizon, at least not at this level. I’m glad to hear, because the Open Source Initiative with this research, what we’re trying to do is to try to understand what are the frameworks that the same way that we have done with the Open Source software. We have provided a way to identify what are the basic needs for developers and citizens to enjoy life in digital space. We’d like to have something similar at least for AI to say, “Look, we can innovate, we can do regulation, we should really pay attention to these specific acts.”

It seems like there are some interesting patterns that are already emerging from these early conversations we had, where we want to be able to inspect, for example, the models and understand what these AI systems are really suggesting. Why are they coming with some decisions? We’ll need to keep on having this conversation. It’s not going to be simple to solve. In fact at this point, it will be interesting to understand since these are new technologies and they’re being introduced now in the markets, and they’re being regulated. Can you make examples of past regulations that have impacted new technologies as they were coming in?

[00:18:56] AT: In Europe, of course, when you ask that everyone immediately thinks of GDPR, the regulation that provides data protection rules. I think it’s been adopted over five years ago, but it’s a very good time to see what happens such regulation. You have to be really humble about the change and causes, right? It’s not easy to implement. It requires a lot of effort for to bring to life, and everyone is willing to admit it, sometimes it backfires, right? Even a very simple thing, there’s this technical term that’s thrown around in European policy debates, which is harmonization of law. 

Basically, what this term says is that we have in Europe almost 30 member states, each one with different law systems and sometimes the EU passes laws that are unified for the whole of you, but sometimes the way it works is that they pass a directive, which then gets rid of adjusted to the local context. Then you can get to a point where you’re thinking you have one rule, for instance, for giving consent or for regulating AI in education, but you also and find out that it actually works completely different than Italy, Poland than the Netherlands. You get into some huge mess, that’s a challenge for any company that tries to build Pan-European business. It’s a challenge for citizens to understand. It’s a chance for policymakers. 

Then you have simple lessons learned like try to harmonize it. Then try to have unified rules. But with this AI regulation, I think what is really interesting and maybe this goes back a bit to your question about, who’s present in the debate? What I’m really interested in is these rules for high-risk situations. Our rules are not just meant to protect citizens. I think there’s rightly so a lot of debate that basically as the questions, how are we going to be safe? How will our rights be protected? I think there are more questions that need to be asked and which Open Source Communities or Open Content Communities are very good at asking which is, how will we make this technology productive in a way that is at the same time sensible, reasonable, and sustainable? 

I think this is the space where this question should be asked when you think of things like impact assessment or transparency, or labeling, because I think you can use the same tools, interesting ideas. Let’s say, registers of AI systems, in one approach they are just meant to limit this technology, right? The technology that is seen as risky, maybe even dangerous, but from a different perspective, it simply creates a framework in which can ask questions, how can this technology be used well? Right? Because I think this is something that can be forgotten when we only talk about risks, that there are positive uses of these technologies, there’s a huge promise that you can find ways of using data to the public benefit, but it requires smart regulation.

[00:21:43] SM: Right. We’ve heard of AI systems that are now capable of folding proteins are predicting how proteins fold, so that these are already helping researchers in biotech industry to investigate some more promising parts. So the machine is not really telling how to solve the biology problem, but it’s giving parts that seems to be more promising. It has some risks, and in fact, but there is some very helpful and very interesting technology in there. What you were saying, as Open Source and Open Content communities, what we are very good at doing it’s been for many years, we’ve been capable of doing that. It’s to find good uses, and put the good technology into the hands of many developers, many content creators.

[00:22:31] AT: I think, the trick here is that the goals are the same, but probably we need to think about new tools, right? I come from a tradition of open content of Creative Commons, which borrows heavily on the Open Source philosophy and methods and basically deployed this most basic, but also extremely functional tool which is an open license, right? A licensing mechanism has really been solving a lot of issues around access, around sharing of content of intellectual property of code. 

I think, this is why this debate about AI is interesting is that at some point, it shows the limits of just saying open it up, right? We know that a lot of the technological stack of AI systems is open, it’s based funded on open code. That doesn’t solve any of the problems of the black boxes we discuss of possible harms. I think we need to take the spirit of open source of openness, but really look for some new solutions.

[00:23:26] SM: Absolutely. It’s the very basic difference that I see is that code and data are very, very clear, very clearly defined and very clearly separated in the context of traditional software. But when it comes to AI, then data becomes an input to a model, the software becomes something that consumes that model. They get entangled in a new way that we need to explore and understand more. We need to help the regulators to understand that too, because it’s early. There are very few activists that probably understand exactly what’s going on inside the systems and the impact is also different.

[00:24:03] AT: Which brings us to the issue of capacity, which I think is important. These policy debates in Europe often, I think they focus so much on the law itself is that, they give a sense to the participants of the debates that really the issues get solved by law, by regulation alone. I don’t think it’s true, because one thing that law is very good at is, for instance, protecting and reducing harms. One thing that’s very hard to do with law is building capacity. You cannot pass a law that says let’s have greater capacity in the public sector to understand AI, to deploy AI, to cooperate on shared systems that are open source and have machine learning and sandwich, I think is the scenario we’d like to see. 

That’s why there’s always the second side of policies, which is about funding policies, research policies, which is a completely different field, right? I think they should be seen together, because I think the challenge, I’m sure all of the world faces, but the Europe in particular is how to add these added capacities. We know that these systems are deployed a lot faster by commercial actors, regulations like the AI Act aim to, which I haven’t mentioned, but the interesting thing, they’re targeted at those who deploy, develop and deploy these technologies. Not at the users, but basically at the creators, the companies, for instance, from which public sector will most probably lease these technologies in some public-private partnerships. 

This is one side of the equation. This idea that you need to look what business does with these technologies, but then we look, let’s say, the public sector. If you think about this list of high-risk areas, if you hear about things like education, vocational training, a lot of this is public, right? Sort of, if you try to picture a school system. I really like to think about education and think about the skills of people there, not just an individual school, but let’s say the city-level school system. Maybe even in the ministry, really, their capacity to deal with complex system is low. Unless we raise it, they will basically be dependent on vendors. 

Of course, you can have a scenario and this is the AI Act scenario where you then regulate the vendors to make sure they do good, but I’m really also interested in scenarios where we think about how do we build the capacity, right? How do these Open Source systems can be deployed in communities of in the case of education? Okay, it’s a bit hard to imagine that educators themselves will do it, but maybe there can be some specialized units and experts within the system, who have the capacity to understand these technologies and to work with them.

[00:26:37] SM: You mentioned that among the various conversations, various regulations that the European Union has deployed that there is one that is focusing on data. Can you talk a little bit about that, because I think it’s connected?

[00:26:50] AT: Indeed, that’s connected, but not too many people mentioned that policymaking is such a siloed endeavor. We have the AI crowd and then you have the data crowd. Of course, there’s some overlap, but it’s as if it’s two different realities and in the end, obviously, AI is fueled by data. Europe has a European strategy for data, which has several really bold elements. Once they find interesting, because they break with the logic that basically the markets will solve everything. The logic, they can treat property as data. There’s act that has already been passed, called the Data Governance Act. One that’s currently being discussed, called the Data Act, which has some really bold ideas behind them is this concept called Common Data Spaces. 

It’s still a bit vague, but basically, Europe envisions that in key areas, let’s say health or transportation industry. You will have these interoperable shared digital spaces in which data flows between different actors, probably both private, public, and civic, maybe not completely openly. This will for sure not be open data, maybe not necessarily for free or in the freemium model, but nevertheless, also not in the model where everything is licensed and under appropriate tariff control. 

This seems to be really influenced by ideas like the commons. Of course, the term here is used vaguely, but basically, where there’s some governance, some management of data as a common good as a shared good. As I said, the outlines are not clear, but I think this is a really fascinating idea. We get fascinated mainly by technologies. AI is fascinating terms, but sometimes these setups proposed by policies, I think are just as fascinating.

[00:28:36] SM: Also, it looks to me like Europe is the only country that has ratified the right to data mining. Is that in the copyright Copyright Act?

[00:28:46] AT: It is. Yes, in the copyright directive.

[00:28:48] SM: Right. Can you briefly highlight the right to data mining? 

[00:28:52] AT: That’s another piece of the puzzle that comes from a different silo, but in the end, fits right into the AI conversation, because basically, the term data mining describes a lot of what you want to do with AI. You want to take a pile of big data and run all sorts of computational techniques that will give you new insights and new knowledge. These are increasingly methods that could be branded as AI, but are traditionally called text and data mining. This has been the big debate in the copyright directive. This has been framed as an intellectual property issue, because basically, what they try to solve is this challenge. There’s a lot of data available today, but you’re not allowed to use it, right? You can almost even scrape it from the internet, but someone might have copyrights to this date or some other kinds of rights. 

Then the rules that were adopted are not broad enough to my liking, mainly because they limit this to non-commercial research, activities and institutions, right? It’s good for scientific research and it’s a much-needed freedom that basically universities and other research institutes need but again, if you put together all these ideas about what we could do with data. If we ensure access to data, probably this regulation could have been broader. I think within this new European data strategy, of course, there’s always question how these rules will play together, but there are some new ideas, how data will be shared. 

For instance, there’s a really strong proposal that could make producers of IoT devices, electric scooters, voice assistants be required to share the data they collect, which could open up really a huge possibility to create really new uses and would really transform this market. Text and data mining, I think it’s a term that really strongly connects basically with research, while these other approaches don’t really focus on research, but also look for other ways of using data.

[00:30:48] SM: In the end, the copyright directive is allowing for data to be opened by default for the purpose of data mining, but only for non-commercial purposes and research purposes.

[00:30:58] AT: For research. Yes. Which always is the same question. It’s a big shift. It’s better than nothing, but is it big enough? I think the problem with policies, with the copyright directive that’s finished with these ongoing acts like AI Act is that you have a sense that these are once-in-a-generation situations, right? Basically, once, 20 years, you can expect this to happen. The Digital Services Act that just was passed, builds on top of the so-called Information Society Directive, which is now 20 years old, exactly 20 years. If you think that you have a chance, once in a generation, you really would to get it right. Then bigger challenge, you’d really like it to be future approved, because you want your law to work in the reality where technologies are deployed almost every year that can change the balance of things. 

This is maybe the big question, also the big debate, it’s one thing whether you get the rules, right? But I think in almost every act, you need to put these provisions that make it future-proof. Okay, you might have a list of four scary uses of AI, but do you have a process for reviewing it? Will you in three or five or 10 years, come back and review that list or review your mechanisms for transparency? I think, if you don’t include that, there’s a high chance that basically technology will make your law obsolete.

[00:32:21] SM: Right. It’s a fine balance to maintain future proofing versus allowing innovation and regulating. 

So from the perspective of the industry, so members of the Open Source Initiative and industry members, advocacy groups and individuals, what do you think we should be doing?

[00:32:38] AT: Well, first of all, you should engage in these policy debates. I hope that’s clear. I think some companies see these issues, but basically, it requires probably some redefining of what it means to be focused on open source, right? It’s just one piece of the puzzle. It’s a very important piece of the puzzle. I think we also need all to take responsibility for that bigger puzzle. This applies also to the committee I’m more familiar with which is the Open Content Committee. By the way, I really appreciate the work done, for instance, by Wikimedia Foundation, which I think has exactly this broad perspective. It thinks of knowledge and content, but it also thinks itself as a platform and engages in debates on platform regulation, but even more broadly, often thinks of the whole ecosystem. 

I think this is what we all need to be doing, because there’s a need to take responsibility for that. I really also appreciate what you said that the Open Source Initiative and other industry actors are really trying to understand what openness means in this new technological reality. What does it mean for a to be open? Because I think on one hand requires reinforcing some well tested recipes. Open sourcing code is a really good idea, I think. But on the other it really requires some creative reworking of what it means for things to be open. 

[00:33:56] SM: Have you thought about it? What does it mean to be open for you? What would be your wish for an open AI?

[00:34:03] AT: We are actually doing a project, maybe that’s worth mentioning, where we’re looking at a very specific case in a case that has been close to my heart, but also really important for the open content, for the creative commerce community, which is a case of AI training datasets. It’s a story that’s by now, almost 10 years old. It’s a story of huge numbers of photographs of people being used to build datasets with which facial recognition models and technologies are built. There’s a famous or infamous data set called MegaPhase, which has 3 million photographs packaged into a tool that’s an industry standard for benchmarking for deploying new solutions. 

It’s also a system that’s quite controversial. When you dig deep inside it’s not entirely clear consent was given. A lot of people say that even though the formal rules of licensing of Creative Commons licensing were met, they see some problem. This uses are unexpected, risky. When you look at the list of users of this data set, you suddenly see military industry, surveillance industry, and people really have some disconnect between the ideas they had in mind when sharing their photos. Yes, agreeing that this will be publicly shared, but usually have a vision of some very positive internet culture. Then you find out that there are these uses that define scary, and I think it’s a case we’re investigating because this is exactly what we’d like to see that we have a discussion about law. 

Also we have a discussion about social and community norms, because when you want to address these risks, I don’t think you can regulate it all. You can, of course, say what’s illegal, but beyond that, you really have to think about standards. That’s one thing I’d like to see. I’m connected with that, I think we don’t have specific solutions for a lot of issues, but there’s one way that gives a good possibility of attaining good outcomes and this is participatory decision making. 

Really, a favor approach is that really tried to draw in different stakeholders, even individual users into the process in the UK. The other level is institute-organized, so-called Citizen Panels, on the use of AI, on the use of big data, on biometric technologies. We’re really people express their views on how they would like to see these systems work in their lives. They are not experts. They will not give you technical expertise, but it turns out if you explain to them the technology, they come up with pretty reasonable ideas about what kind of world they would like to live in.

[00:36:37] SM: That’s very important indeed to engage and to talk to regulators. I’ve learned a long time ago that doing that does not mean getting dirty.

[00:36:46] AT: The other way around, it’s great for regulators to also reach out to people and really treat them as partners and not just limit the conversation to some narrow group of stakeholders.

[00:36:56] SM: Indeed. Well, thank you very much for your time, Alek. Is there anything else that you would like to add that we think we haven’t covered?

[00:37:04] AT: I’m keeping fingers crossed that Europe will develop one more piece in its fascinating array of new regulations, which some people observe with fascination, some people observe with awe or fear. I think of policies as powerful as means of world-building. We like to think that it’s technologists who create the world today, but I think policy has a lot of opportunity to shape the world as well. I just hope, more and more people believe that and engage in policy processes, because they’re not only important, but they can also be fun.

[00:37:39] SM: Wonderful. Thank you very much, Alek. 


[00:37:42] SM: Thanks for listening. Thanks to our sponsor, Google. Remember to subscribe on your podcast player for more episodes. Please review and share. It helps more people find us. Visit deepdive.opensource.org, where you find more episodes, learn about these issues, and you can donate to become a member. Members are the only reason we can do this work. If you have any feedback on this episode, or on Deep Dive: AI in general, please email contact@opensource.org.

This podcast was produced by the Open Source Initiative, with the help from Nicole Martinelli. Music by Jason Shaw of audionautix.com, under Creative Commons Attribution 4.0 international license. Links in the episode notes.

[00:38:24] ANNOUNCER: The views expressed in this podcast are the personal views of the speakers and are not the views of their employers. The organizations they are affiliated with, their clients, or their customers. The information provided is not legal advice. No sponsor had any right or opportunity to approve or disapprove the content of this podcast.


The views expressed in this podcast are the personal views of the speakers and are not the views of their employers, the organizations they are affiliated with, their clients or their customers. The information provided is not legal advice. No sponsor had any right or opportunity to approve or disapprove the content of this podcast.

Keep up with Open Source

    We’ll never share your details and you can unsubscribe with a click! See our privacy policy.

    Other Episodes

    Episode 6: transcript

    EPISODE 6: How to secure AI systems “BD: Now we're in this stage of, 'Oh my, it works.' Defending AI was moot 20 years ago. It didn't do anything that was worth attacking. Now that we have AI systems that really are remarkably powerful, and that are jumping from...

    Episode 5: transcript

    “MZ: In order to train your networks in reasonable time schedule, we need something like GPU and the GPU requires no free driver, no free firmware, so it will be a problem if Debian community wants to reproduce neural networks in our own infrastructure. If we cannot...