Open Ended Dreamer

Open-Ended Library Learning in Unsupervised Program Synthesis

**Figure 1: Extract of Elites Structures** for a BASE (top) resp. NOVEL (middle) resp. INNATE (bottom) run.

Pairing a neuro-symbolic model with library learning to facilitate program induction seems a promising way of fostering open-ended innovation, by leveraging the robustness, expressivity, and extrapolative capabilities of programs. This paper presents Open-Ended Dreamer (OED), an unsupervised diversity-oriented neuro-symbolic learner built upon DreamCoder , as a first step towards harnessing compositionality for open-ended program discovery. By alternating between phases of generation, selection, and abstraction, OED aims to expand a hierarchical library of diversity-enabling building blocks (in the form of programs), which are subsequently reused and composed in later iterations. As a first test-bed, we apply OED to a tower building domain and investigate the impact of library learning, neural guidance, innate priors, and language or environmental pressures on the formation of symbolic knowledge. Our experiments suggest that promoting greater exploration and stochasticity is crucial to offset the bias introduced by the growing language, and foster more creative divergence.

Authors

Claire Glanois

Shyam Sudhakaran

Elias Najarro

Sebastian Risi

Affiliation

IT University of Copenhagen

Published

Alife 2023

Introduction

Open-Endedness: Open-ended refers to a process that produces increasingly diverse and complex artifacts or behaviors. An archetypal inspiration are evolutionary processes which, through iterations blending variation and selection mechanisms, have brought a stunningly diverse array of lifeforms on Earth.

Compositionality for Open-Endedness Not only do the majority of lifeforms display intricate nested and modular structures, but a considerable portion of human knowledge, whether factual or procedural, also embodies a form of hierarchical and compositional arrangement. We continually create more sophisticated artifacts and behaviors by building upon previous achievements --from genomics to internet via soccer (cf. Figure 2)--, whether they are explicit or implicit, originating from collective or individual efforts. The capacity to reuse, combine, transfer, or expand upon past discoveries plays a pivotal role in human learning and adaptation, and appears essential for fostering open-ended innovation.

**Figure 2: Innovation Scaffoldings:** the emergence of innovation --from genomics to Internet via soccer-- can be schematized through a directed graph structure, where nodes represent artifacts or behaviors, and edges represent the dependency between them.

Program Synthesis with Library Learning In the pursuit of fostering open-ended innovation and harnessing compositional power, programs appear as a compelling candidate representation. Indeed, programs are an adaptive, robust and expressive way to encode both artifacts and behaviors, compelling for their logical, interpretative and extrapolative qualities. Program synthesis, paired with Library Learning, can enable to preserve regularities in a potent and versatile way, as functions of certain arity Program synthesis is the task of automatically generating programs from a given specification, e.g. textual or set of inputs-outputs examples.. Because of the combinatorial nature of the search space, pure symbolic search is often not judicious, and instead, many recent works have explored how to leverage deep neural networks with program induction, as for instance in neurally-guided program synthesis.

DreamCoder For instance, a neuro-symbolic model like DreamCoder (Ellis et al., 2021) has proven to be versatile in tackling multiple domains via program induction, from creative tasks as logo generation to more classic programming inductive tasks like physical laws regression. Framing learning as program induction, it aims to jointly learn a hierarchical library of building blocks-represented as programs, and the intuition (neural model) on how to compose them to solve a given task. DreamCoder adopts a Bayesian framework, where the library encodes a prior on program and the neural model aims to approximate a Bayesian posterior. This work extends DreamCoder towards open-endedness, with ways to encourage diversity in both program generation and abstraction, but also without the need to specify task (relaxed via Quality Diversity approach). Our diversity-oriented neuro-symbolic model—Open-Ended Dreamer (OED)— balances efficiency and novelty pressures in order to learn a compact set of diversity-enabling programs, organised in a hierarchical library.

Model

**Figure 3: Open-Ended Dreamer follows an open-ended neuro-symbolic iterative process.** (1) Neurally Guided Generation: by auto-regressively querying the neural model QL, programs are generated for each niche; (2) Novelty Selection: most novel programs are selected to form the elites (3) Abstraction across niches: refactored elites are abstracted to grow a library L of programmatic abstractions; (4) Neural Training: consolidate the neural guidance on both replays and sampled programs.

As DreamCoder The core differences with DreamCoder, detailed in the following paragraphs, are meant to encourage diversity both in the generation and abstraction phase, such as: (i) supervision relaxation via QD framework (ii) controllable exploration and stochasticity in the generation phase, (iii) novelty-selection phase (iv) stochastic pruning and novelty-filtering in the abstraction phase., OED consists of both a neural and a symbolic component bootstrapping each other:

L, hierarchical symbolic library of programs, capturing regularities in elites across niches.
Q_L, neural model guiding the search for efficient and novel programs.

As schematized in Figure 3, OED learns through an iterative process alternating generation, selection, abstraction and training steps. At each iteration, elites programs are selected and organised by niches, before possibly being abstracted in the library-learning stage.

Supervision Relaxation via Quality Diversity By adopting a QD approach, OED relaxes DreamCoder's supervision needs by eliminating the requirement for handpicking tasks. Following MAP-Elites (Mouret and Clune, 2015), OED keeps track of a multi-dimensional archive of phenotypic elites A, cut into niches along some predefined behavioral characteristics. In our experiments, we use 3D behavioral features (G, W, B) composed of gravity-center (rel. height) G, width W, and number of blocks B of the structure generated Each dimension is split in 4, to create a total of 4 ∗ 4 ∗ 4 = 64 niches. For instance, a program belongs to the niche [0, 2, 1] if its gravity center (y-coordinate) is ∈ [0, MG/4[, its width ∈ [MW /2, 3MW /4[ and its number of blocks ∈ [MB/4, MB/2[, where MG, MW , MB are upper boundaries set for each feature dimension (e.g. 120, 240 and 400 in our experiments)..

Generation with Exploration Both a top-down generation phase following the generative model induced by the library L, and a bottom-up generation phase following the conditional neural model Q_L takes place at different stages in each iteration. Each generation phase encompasses an enumeration procedure and a sampling procedure.

In the enumeration, similarly to DreamCoder, programs ρ are enumerated in decreasing posterior probability order in each niche: More details about this enumeration procedure, which mixes best-first search with depth-first can be found in DreamCoder (Ellis et al., 2021), Algorithm 3.

argmax_{ρ ∈ z} P(ρ | z, L) ∝ P(z | ρ) P(ρ | L)

where z denotes a niche, ρ a program, L the library. The likelihood P(z | ρ) evaluates if a program ρ fits in a given niche z, and the prior P(ρ | L) reflects how they fit in the library, i.e. how likely they are under the generative model L.
To encourage further exploration, OED incorporates a sampling from the library L (for the top down phase) or from neural model Q_L (for the bottom up phase). While the enumeration procedure aims to find the most efficient programs (via posterior-optimisation), the sampling procedure enables wider exploration, by relaxing this optimisation into a minimal criteria on the posterior:

P(ρ | L, z) ≥ δ

where δ is a lower bound on the posterior. Adjusting the exploration ratio α ∈ [0, 1] (ratio of sampled programs vs enumerated), and the sampling temperature parameter τ (Boltzman Exploration) enables to balance the efficiency pressure, as illustrated in our experiments below.

Novelty Selection At each iteration, the most novel programs are selected from previously generated programs (top down and bottom up frontiers), to form the elites E, under the condition that their novelty score is above a novelty threshold δ nov. The local novelty score of a program ρ in a niche z is measured as the k-mean of the phenotypic distance to the archive elements of the same niche, which in most of our experiment relies on a pre-trained visual network (VGG 16 Simonyan and Zisserman (2014)) The phenotypic distance between programs is then defined as the euclidean distance between the CNN embedding features of the block structures they correspond to (after executing the program and rendering it as an image). . Top-N programs are selected one-by-one for the sampled niche, and at each step, the novelty score of each candidate is re-evaluated to adapt to the growing niche.

Abstraction In essence, the abstraction phase aims to grow the library from useful and novel building blocks found in elites programs across niches. The abstraction phase, first refactors the elites to find common program fragments, before abstracting these into new programs It relies on the self-supervised compression algorithm from DreamCoder (Ellis et al., 2021) (cf. Appendix S4.5) which aims to maximize a Bayesian criterion, through a generalised Expectation Maximisation algorithm.. This sequestration of information into reusable abstracted units is a form of knowledge distillation. These abstracted programs are functions with maximum 3 arguments in our experiments. To encourage a more evolving library, soften the language-bias, and incorporate additional stochasticity, OED includes a stochastic pruning mechanism, weighted by innovation persistence: every iteration, a weighted sampling of ≤ β primitives to prune is performed (β = 2, 3 in our experiments). The sampling weights corresponds to the activity score of each primitive at the current iteration. We also incorporate a novelty-guided library minimal criteria, enforcing the new primitives to be distant enough from the existing primitives, relying on a point-based approximation. Distance between primitives (function): first, we sample for each primitive a set Cˆp of mp ≤ 20 programs obtained by instantiating p on arguments ∈ {0, 9}. The approximate distance between primitives is then defined as the mean over programs in p-cloud of the distance to the p′−cloud.

Neural Training The neural model, aims to learn how to compose the programmatic abstraction to create novel programs. It encodes a distribution over the elements in the library conditioned on the local context: taking as input a latent niche vector z, and the current state s passed as an image (e.g. image of the current tower being constructed), it outputs a probability tensor encoding the probability of a next token being j given the previous token in the growing language. Querying the neural model Q auto-regressively produces a program. It is trained both on replays –i.e. elites from previous iterations– and dreams –i.e programs sampled from the generative model. It plays the crucial role of biasing the search space, as it drastically reduces the breadth of search in the combinatorially exploding space of programs.

Experiments: Tower Domain

As a first testbed, we demonstrate our approach on the Tower Domain introduced in DreamCoder (Ellis et al. (2021), Appendix S2.1.4). We rely on the same monadic, continuation-passing encoding of imperative programs. In this domain, the programs produce a sequence of actions, which encode the instruction of a hand moving over a 2D discretized canvas, which can drop horizontal or vertical blocks. The state of the hand includes both a 1D position and a boolean orientation. By default, the initial Library is seeded with two basic control flow primitives: for (for loop), and get/set which gets and saves the current state of the agent's hand. We also provide movement-related and construction-related primitives: moveHand (advancing the hand), reverseHand (flipping the orientation), dropVerticalBlock and dropHorizontalBlock. To define the niches, we adopt a 3D behavioral characteristic, consisting of (1) center of gravity (y-coordinate) (2) width, and (3) number of blocks of the tower structure generated from a program. The ecosystem minimal criteria consists of lower and upper bounds to these values.

Experiment 1: Bias Experiment

The emphasis on implementation nested in the notion of innovation implies the candidate novel artifacts-ideas will likely undergo strong environmental pressures (e.g. market pressure, functional pressures, social pressures etc.) when being put at test in the real world, heavily biasing the search space. It seems therefore legitimate to wonder if strong inductive biases and environmental coupling are key to guide the emergence of abstract structured knowledge and innovation. Generating and retaining 'stepping stones to everywhere' seems indeed too ambitious in such a combinatorially exploding space of programs in the absence of any environmental pressure (e.g. with survival or functional constraints). Combining such quest with some inductive bias appears to be a more realistic reflection of how symbolic knowledge formation occurs in social and natural systems. Moreover, it holds the potential for yielding more reliable and consistent innovations. We investigate different ways to bias evolution and guide the formation of symbolic knowledge, through either:

(INNATE): with a bootstrapped Library, initially seeded with 5 primitives
(PHYSICS): with a gravity-aware enviroment, enacting strong environmental pressure
(CNN): with a pretrained distance metric for novelty evaluation

As expected, from our first experiment, both innate priors and biased metrics seems to guide OED towards producing more qualitatively compelling artifacts Innate priors help bootstrap the search at early stages, yet overtime do not bring a strong advantage, which may be due to the stronger learned language bias. The challenges brought by the physical constraints considerably slow down and constraint the program discovery (cf. search times (c), or the number of found solutions).

Experiment 2: Novelty-Diversity Trade Off

**Figure 5: The environmental, language and novelty pressures at play in the selection of generated programs at each iteration.** (1) Legit programs (grammatically correct) undergo environmental pressures (e.g. niches and minimal criteria on the number of blocks, width, etc.), enclosed in the likelihood P(z | ρ); (2) Candidates programs undergo a language-pressure, deriving from the learned library L, enacted through the prior P(ρ | L); (3) finally, generated programs are subjected to a novelty selection.

As illustrated Figure 5, at the core of OED search for diversity-enabling programs –and at the core of open-endedness– lies a delicate balance between environmental, efficiency and novelty pressures. These different pressures at play to select elites at each iteration may be conflictive. Indeed, learning a language also implies learning a strong inductive bias towards the search space, which may impede diversity e.g. novel solutions compared to the archive commonly stands out as less likely given the library (therefore, less efficient) and conversely, most efficient programs (minimising the prior P(ρ | L)), often are less novel. . This language bias is notably enacted through the prior P(ρ | L) pressure in the generation step. Under a strong language bias, early primitives incorporated in the library have a strong effect on the generation of new programs in later runs.

Convergent Evolution Strong efficiency pressures, either from environmental constraints (cf. (PHYSICS)), or via stronger language-pressure (cf. (BASE)), or conversely lower novelty pressure (as in the (A-N) run), seem to favor phenomenon of convergent evolution, i.e. the repeatability of certain evolutionary outcomes Indeed, certain programs and primitives were (approximately) reinvented across different evolutionary runs, as they appeared to be some of the most efficient way to inhabit certain ecological niches and fit the minimal criteria (notably when this one is more demanding as in the gravity-aware environment): e.g. slanted lines, cranes-type structure, parallelograms, comb-like artifact, wall-type of structures, and combinations of them (as displayed in Figure ....) were invented across multiple runs..

Enhanced Exploration To mitigate the language bias, OED provide ways to loosen the efficiency pressure in favor of the novelty pressure, by promoting exploration in the generation step, notably through increasing the exploration ratio α weighting the sampling or the sampling temperature τ. Early experiments on this enhanced exploration confirm that more diverse and novel programs are generated. However, by reducing the presence of regularities among elites, excessive exploration can impede the abstraction phase and therefore the growing of the library. Further work is needed to automatically learn to adjust this balance.

Experiment 3: Ablation Experiments

To assess the benefit of the different components of our model in the open-ended creation of diverse programs, we run several ablation studies:

(A-L) without the library
(A-R) without the neural guidance
(A-N) without the novelty-Selection phase. Our ablation experiments confirm that the neural guidance is crucial in this OE program discovery to build up in complexity and novelty. These experiments suggest that Library learning improves the learning and supports a more sustained generation, yet it considerably bias the search space and may impede diversity.

Discussion

While preliminary, our results suggest that neurally-guided library learning, enabling both more efficient program discovery and more sustained diversity is a promising framing towards open-endedness. By building a library of programmatic abstractions —instead of solutions—, our method preserves regularities in a more potent and versatile way than other diversity-oriented approaches. As we have observed, OED has the potential to guide the development of innovation using innate priors, biased metrics, and environmental constraints. Accentuating efficiency pressures, such as language-bias and environmental factors, led to phenomenon of convergent evolution. Library learning entails a growing amount of bias, which may have a negative impact on diversity. To mitigate this effect, and foster creative divergence, it seems crucial to promote exploration and stochasticity, both during the generation and abstraction phases. However, by reducing the presence of regularities among elites, excessive exploration may impede the abstraction mechanism. Further investigation is necessary to determine the optimal balance between exploration and efficiency in library learning, which could be automatically adjusted depending on the task. The simplicity of the Tower Domain task do not allow OED to reach its full potential for open-ended creation. To unleash its full capabilities, and the benefit of novelty-oriented library learning model in downstream tasks, OED should be deployed in richer environments, at longer evolutionary scale. Another compelling development would be to use OED for the production of increasingly sophisticated behaviors, instead of simply artifacts.
Get in Touch

Email: clgl 'at' itu.dk Twitter: @claire__aoi

Acknowledgments

This project was funded by a DFF-Research Project grant (9131 − 00042B)
References
1. Dreamcoder: Bootstrapping inductive program synthesis with wake-sleep library learning.
  Ellis, K., Wong, C., Nye, M., Sable-Meyer, M., Morales, L., Hewitt, L., Cary, L., Solar-Lezama, A., and Tenenbaum, J. B, 2021. . In Proceedings of the 42nd acm sigplan international conference on programming language design and implementation, pages 835–850.