Overview and Aims

AAAI Spring Symposium on Computational Approaches to Scientific Discovery

Background

The discovery of new scientific knowledge is widely viewed as one of the highest forms of human accomplishment. This makes the topic worthy of study from computational perspective, both to understand how humans carry out this complex activity and to produce tools that can assist in the process. Shrager and Langley (1990) have referred to both branches of this endeavor as computational scientific discovery. For many years, philosophers of science commonly declared that the discovery process relied on a "creative spark" that we could never fathom (e.g., Popper, 1961). They claimed there could be no "logic" of discovery and they viewed the idea that computer programs could automate this process as ludicrous. This view was shared by the general populace and even by many scientists.

The situation changed in the 1970s when a number of teams developed AI systems that tackled this problem. Some demonstrated the ability to rediscover insights from the history of mathematics (Lenat, 1977) and physics (Langley, 1981), whereas others produced novel results in specific disciplines like organic chemistry (Lindsay et al., 1980). The innovators came from different backgrounds – psychology, computer science, and philosophy – but shared a commitment to understanding discovery in computational terms. Research in this tradition has continued for over forty years, leading to publications in both AI (Dzeroski & Todorovski, 1995; Bradley et al., 2001; Bridewell et al., 2008) and scientific literatures (Valdes-Perez, 1994; Langley, 2000). Multiple books have addressed the topic (Langley et al., 1987; Glymour et al., 1987; Shrager & Langley, 1990; Dzeroski & Todorovski, 2007; Addis et al., 2019) and four symposia (1989, 1995, 2001, and 2013) have convened experts in the area, but the problem did not receive widespread attention.

At least, that was the case until recently, when researchers from different fields, especially physics and applied mathematics, began devoting energy to the topic. Work from this perspective emphasizes continuous mathematics rather than symbolic structures and often relies on recent developments in statistical learning and high-powered computers. Recent examples include Brunton, Proctor, and Kutz (2016), Chen et al. (2019), Cranmer et al. (2020), Fries, He, and Choi (2022), Iten et al. (2020), Raissi and Karniadakis (2018), Wu and Tegmark (2019), and Zhang and Lin (2018), many of them inspired by Schmidt and Lipson's (2009) visible work in the area. These researchers have made parallel progress on problems related to those addressed by the older community, but there has been little contact between the two groups to date.

Despite the differences between these two paradigms, they also share some important assumptions about the scientific enterprise and the nature of discovery that suggest the potential for bridging the current gap between them. These include beliefs that:

Scientific discovery is not solely about data or about models, but about finding relations between them;
Discovered models should not only make predictions but also provide deeper accounts that are consistent with scientific theory; and
Discovery should produce models that are interpretable and stated in established scientific formalisms.

These three points are distinctive features of science that set it apart from other intellectual pursuits. They also impose constraints on the discovery task that mean traditional techniques for machine learning will not suffice to address it, at least without modification or adaptation. The general agreement on these issues is encouraging and suggests paths for crossing the paradigmatic divide.

Aims for the Symposium

A symposium on computational scientific discovery is timely because it can let us strengthen connections between the different communities that have been pursuing this important problem. The paradigms have been operating separately for some years, they come from different intellectual traditions, and they publish in different venues. As a result, they have developed their own terminologies and often rely on different computational methods. For instance, "classic" discovery work has emphasized heuristic search through a space of discrete structures, whereas many recent efforts have favored search through a space of continuous parameters. The publication styles also differ, with classic papers emphasizing data structures and the newer paradigm focusing on mathematical analyses.

We hope that the symposium will help overcome this conceptual divide, increase interaction among the groups, and encourage a more unified research community with a common view of scientific discovery. To achieve these aims, it will include introductory review talks about the problems, methods, and results in each paradigm to familiarize participants with them. We will also organize sessions around types of scientific models (e.g., qualitative structures, causal relations, numeric equations, processes) rather than methodological paradigms. In addition, we will ask speakers to abstract away from their algorithmic and mathematical details and instead to focus on other facets of their work, in particular:

The original discovery problem they wanted to solve;
How they formulated the problem in computational terms;
What data and knowledge they provided to their system;
How they represented the system's inputs and outputs;
The space of candidate models that the system searched;
What criteria it used to evaluate candidate models; and
How they interpreted results that the system generated.

Structuring talks in this way should increase communication among discovery researchers who come from different backgrounds and who favor different methods, drawing their attention instead to underlying commonalities.

We believe that strategies of this sort will lead to a community of researchers who appreciate different approaches to a shared set of problems. We also hope they will offer novel insights to participants about some key questions, such as:

What are the major tasks that arise in scientific discovery and how are they related?
What are common challenges and responses that cut across alternative discovery paradigms?
What computational abstractions are useful across different scientific disciplines?
How can we evaluate discovery systems in ways beyond models' predictive accuracies?
How can we move beyond informal and subjective accounts of model interpretability?

Although the meeting may not produce definitive responses to these questions, we expect it to suggest promising research programs. This in turn should foster development of a shared agenda for continued work in the area. The meeting will also encourage movement beyond the current glamour of "big data", as observations remain sparse in many scientific fields.

Schedule and Submissions

The symposium will run two and a half days, from morning on Monday, March 27, through midday on Wednesday, March 29, 2023. A few invited speakers will open the meeting with surveys of different paradigms and we plan to reserve the final morning for talks and discussions about evaluation challenges. The organizing committee will select other presentations from submitted abstracts rather than formally reviewing full papers. We hope to obtain 18 to 20 technical presentations in this manner. We will aim for a leisurely pace that encourages discussion and ensures high-quality interaction among participants.

Authors should submit abstracts of proposed talks through the AAAI Spring Symposium EasyChair site, by ~~January 15~~ January 30, 2023, AoE (Anywhere on Earth). These should be no longer than one page in 11-point, single-column style using pdf format, in effect providing an outline of the proposed content. Submissions should include one or two references that are representative of the authors' work on scientific discovery, along with links to these papers. The organizing committee will select abstracts that cover a broad range of problems and approaches. Please indicate whether you will be able to give your talk in person at the meeting. A few virtual presentations may be possible, but the committee will favor in-person talks. When making a submission, make sure to select the track Computational Approaches to Scientific Discovery rather than the track SSS-23, which is a holdover from an earlier version of the EasyChair site.

For submissions that report implemented systems, the committee will give preference to ones that address the first seven questions listed above rather than ones that emphasize algorithmic and mathematical details. We also welcome abstracts on other topics, such as proposals for how to evaluate discovery systems or precise specifications of new discovery problems. Authors of accepted abstracts will be asked to write a full paper for distribution to symposium participants on this Web site. They may also be encouraged to submit an expanded version for possible inclusion in a special issue of Machine Learning or another refereed journal that reports results presented at the event.

Logistics and Attendance

The symposium will take place at the Hyatt Regency San Francisco Airport, located at 1333 Bayshore Highway, Burlingame, California, 94010. The hotel's telephone number is +1 650-347-1234 and its email address is salessfobu@hyatt.com. If you would like to stay at the hotel, you may still be able to reserve a room at this link, although the deadline for receiving the AAAI rate has passed.

Symposium participants, including each of the speakers, must register for the meeting whether they plan to attend in person or virtually. The registration link will remain open until the meeting begins and there will be no extra charge for late registration.

To attend in person, AAAI will require proof of Covid vaccination or a valid medical or religious exemption (e.g., from an employer or doctor), as well as a negative COVID test dated 48 hours or less from the time you pick up your badge. AAAI will offer a full refund to anyone who cannot attend because he or she tests positive for Covid during the week before the event.

Symposium Organizers

Youngsoo Choi / Lawrence Livermore National Laboratory / choi15@llnl.gov / https://people.llnl.gov/choi15/
Sašo Džeroski / Jozef Stefan Institute / saso.dzeroski@ijs.si / http://www-ai.ijs.si/SasoDzeroski/
J. Nathan Kutz / University of Washington / kutz@uw.edu / http://faculty.washington.edu/kutz/
Pat Langley / Stanford University, CA / langley@stanford.edu / http://www.isle.org/~langley/

References

Addis, M., Lane, P. C. R., Sozou, P. D., & Gobet, F. (Eds.). (2019). Scientific discovery in the social sciences. Cham, Switzerland: Springer.

Bradley, E., Easley, M., & Stolle, R. (2001). Reasoning about nonlinear system identification. Artificial Intelligence, 133, 139–188.

Bridewell, W., Langley, P., Todorovski, L., & Džeroski, S. (2008). Inductive process modeling. Machine Learning, 71, 1–32.

Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113, 3932–3937.

Chen, R. T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. K. (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems 31. Curran Associates.

Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., et al. (2020). Discovering symbolic models from deep learning with inductive biases. Neural Information Processing Systems 33. Vancouver, Canada.

Džeroski, S., & Todorovski, L. (1995). Discovering dynamics: From inductive logic programming to machine discovery. Journal of Intelligent Information Systems, 4, 89–108.

Džeroski, S., & Todorovski, L. (Eds.). (2007). Computational discovery of scientific knowledge. Berlin: Springer.

Fries, W. D., He, X., & Choi, Y. (2022). LaSDI: Parametric latent space dynamics identification. Computer Methods in Applied Mechanics and Engineering, 399, 115436.

Glymour, C., Scheines, R., Spirtes, P., & Kelly, K. (1987). Discovering causal structure: Artificial intelligence, philosophy of science, and statistical modeling. San Diego: Academic Press.

Iten, R., Metger, T., Wilming, H., et al. (2020). Discovering physical concepts with neural networks. Physical Review Letters, 124, 010508.

King, R. D., Rowland, J., Oliver, S. G., et al. (2009). The automation of science. Science, 324, 85–89.

King, R. D., Whelan, K. E., Jones, F. M., et al. (2004). Functional genomic hypothesis generation and experimentation by a robot scientist, Nature, 427, 247–252.

Langley, P. (1981). Data-driven discovery of physical laws. Cognitive Science, 5, 31–54.

Langley, P. (2000). The computational support of scientific discovery. International Journal of Human-Computer Studies, 53, 393–410.

Langley, P., Simon, H. A., Bradshaw, G. L., & Zytkow, J. M. (1987). Scientific discovery: Computational explorations of the creative processes. Cambridge, MA: MIT Press.

Lenat, D. B. (1977). The ubiquity of discovery. Artificial Intelligence, 9, 257–285.

Lindsay, R. K., Buchanan, B. G., Feigenbaum, E. A., & Lederberg, J. (1980). Applications of artificial intelligence for organic chemistry: The DENDRAL project. New York, NY: McGraw-Hill.

Raissi, M., & Karniadakis, G. E. (2018). Hidden physics models: Machine learning of nonlinear partial differential equations. Journal of Computational Physics, 357, 125–141.

Popper, K. R. (1961). The logic of scientific discovery. New York: Science Editions.

Schmidt, M., & Lipson, H. (2009). Distilling free-form natural laws from experimental data. Science, 324, 81–85.

Shrager, J., & Langley, P. (Eds.) (1990). Computational models of scientific discovery and theory formation. San Francisco: Morgan Kaufmann.

Todorovski L. (2011). Equation discovery. In C. Sammut & G. I. Webb (Eds.), Encyclopedia of machine learning. Boston, MA: Springer.

Valdes-Perez, R. E. (1994). Human/computer interactive elucidation of reaction mechanisms: Application to catalyzed hydrogenolysis of ethane. Catalysis Letters, 28, 79–87.

Wu, T., & Tegmark, M. (2019). Toward an artificial intelligence physicist for unsupervised learning. Physical Review E, 100, 033311.

Zhang, S., & Lin, G. (2018). Robust data-driven discovery of governing physical laws with error bars. Proceedings of the Royal Society A, 474, 20180305.