Open Data for AI
In the AI era, the meaning, significance, and rules of open data are being challenged.
Under this initiative, we are attempting to explore and respond to three questions:
- What paradigm shifts are there for open data in the AI era?
- What new rules do we need to regulate the open flow of data required for AI?
- How can we incentivize further data openness in the AI era? Why do they (not) open their data?
In 2020, we collaborated with the Shanghai Baiyulan Open AI Institute to launch the first project under this initiative, the Mulan-Baiyulan License, aiming to explore how we should provide an optimized open license for datasets used for AI training in a local context.
In 2021, we further hoped to understand why different academic entities release open data, especially which open data in China are available for AI use, and how to further promote this openness. How can we make Baiyulan open license a part of the ecosystem to facilitate and support data openness?
We are still exploring these questions, especially as the pandemic has interrupted some of our work. However, we plan to continue exploring issues in this field and, combining our explorations in data institutions, attempt to promote the use of public data for AI.