Open-Ended Library Learning in Unsupervised Program Synthesis
Pairing a neuro-symbolic model with library learning to facilitate program induction seems a
promising way of fostering open-ended innovation, by leveraging the robustness, expressivity,
and extrapolative capabilities of programs. This paper presents Open-Ended Dreamer (OED),
an unsupervised diversity-oriented neuro-symbolic learner built upon DreamCoder
Open-Endedness: Open-ended refers to a process that produces increasingly diverse and complex artifacts or behaviors. An archetypal inspiration are evolutionary processes which, through iterations blending variation and selection mechanisms, have brought a stunningly diverse array of lifeforms on Earth.
Compositionality for Open-Endedness Not only do the majority of lifeforms display intricate nested and modular structures, but a considerable portion of human knowledge, whether factual or procedural, also embodies a form of hierarchical and compositional arrangement. We continually create more sophisticated artifacts and behaviors by building upon previous achievements --from genomics to internet via soccer (cf. Figure 2)--, whether they are explicit or implicit, originating from collective or individual efforts. The capacity to reuse, combine, transfer, or expand upon past discoveries plays a pivotal role in human learning and adaptation, and appears essential for fostering open-ended innovation.
Program Synthesis with Library Learning
In the pursuit of fostering open-ended innovation and harnessing compositional power, programs appear as a compelling candidate representation.
Indeed, programs are an adaptive, robust and expressive way to encode both artifacts and behaviors, compelling for their logical, interpretative and extrapolative qualities.
Program synthesis, paired with Library Learning, can enable to preserve regularities in a potent and versatile way, as functions of certain arity
DreamCoder
For instance, a neuro-symbolic model like DreamCoder (Ellis et al., 2021) has proven to be versatile in tackling multiple domains via program induction, from creative tasks as logo generation to more classic programming inductive tasks like physical laws regression.
Framing learning as program induction, it aims to jointly learn a hierarchical library of building blocks-represented as programs, and the intuition (neural model) on how to compose them to solve a given task.
DreamCoder adopts a Bayesian framework, where the library encodes a prior on program and the neural model aims to approximate a Bayesian posterior.
This work extends DreamCoder towards open-endedness, with ways to encourage diversity in both program generation and abstraction, but also without the need to specify task (relaxed via Quality Diversity approach).
Our diversity-oriented neuro-symbolic model—Open-Ended Dreamer (OED)— balances efficiency and novelty pressures in order to learn a compact set of diversity-enabling programs, organised in a hierarchical library.
As DreamCoder
As schematized in Figure 3, OED learns through an iterative process alternating generation, selection, abstraction and training steps. At each iteration, elites programs are selected and organised by niches, before possibly being abstracted in the library-learning stage.
Supervision Relaxation via Quality Diversity
By adopting a QD approach, OED relaxes DreamCoder's supervision needs by eliminating the requirement for handpicking tasks.
Following MAP-Elites (Mouret and Clune, 2015), OED keeps track of a multi-dimensional archive of phenotypic elites A, cut into niches along some predefined behavioral characteristics.
In our experiments, we use 3D behavioral features (G, W, B) composed of gravity-center (rel. height) G, width W, and number of blocks B of the structure generated
Generation with Exploration Both a top-down generation phase following the generative model induced by the library L, and a bottom-up generation phase following the conditional neural model Q_L takes place at different stages in each iteration. Each generation phase encompasses an enumeration procedure and a sampling procedure.
argmax_{ρ ∈ z} P(ρ | z, L) ∝ P(z | ρ) P(ρ | L)
where z denotes a niche, ρ a program, L the library. The likelihood P(z | ρ) evaluates if a program ρ fits in a given niche z, and the prior P(ρ | L) reflects how they fit in the library, i.e. how likely they are under the generative model L.
P(ρ | L, z) ≥ δ
where δ is a lower bound on the posterior. Adjusting the exploration ratio α ∈ [0, 1] (ratio of sampled programs vs enumerated), and the sampling temperature parameter τ (Boltzman Exploration) enables to balance the efficiency pressure, as illustrated in our experiments below.
Novelty Selection At each iteration, the most novel programs are selected from previously generated programs (top down and bottom up frontiers),
to form the elites E, under the condition that their novelty score is above a novelty threshold δ nov.
The local novelty score of a program ρ in a niche z is measured as the k-mean of the phenotypic distance to the archive elements of the same niche, which in most of our experiment relies on a pre-trained visual network (VGG 16 Simonyan and Zisserman (2014))
Abstraction
In essence, the abstraction phase aims to grow the library from useful and novel building blocks found in elites programs across niches.
The abstraction phase, first refactors the elites to find common program fragments, before abstracting these into new programs
Neural Training The neural model, aims to learn how to compose the programmatic abstraction to create novel programs. It encodes a distribution over the elements in the library conditioned on the local context: taking as input a latent niche vector z, and the current state s passed as an image (e.g. image of the current tower being constructed), it outputs a probability tensor encoding the probability of a next token being j given the previous token in the growing language. Querying the neural model Q auto-regressively produces a program. It is trained both on replays –i.e. elites from previous iterations– and dreams –i.e programs sampled from the generative model. It plays the crucial role of biasing the search space, as it drastically reduces the breadth of search in the combinatorially exploding space of programs.
As a first testbed, we demonstrate our approach on the Tower Domain introduced in DreamCoder (Ellis et al. (2021), Appendix S2.1.4). We rely on the same monadic, continuation-passing encoding of imperative programs. In this domain, the programs produce a sequence of actions, which encode the instruction of a hand moving over a 2D discretized canvas, which can drop horizontal or vertical blocks. The state of the hand includes both a 1D position and a boolean orientation. By default, the initial Library is seeded with two basic control flow primitives: for (for loop), and get/set which gets and saves the current state of the agent's hand. We also provide movement-related and construction-related primitives: moveHand (advancing the hand), reverseHand (flipping the orientation), dropVerticalBlock and dropHorizontalBlock. To define the niches, we adopt a 3D behavioral characteristic, consisting of (1) center of gravity (y-coordinate) (2) width, and (3) number of blocks of the tower structure generated from a program. The ecosystem minimal criteria consists of lower and upper bounds to these values.
The emphasis on implementation nested in the notion of innovation implies the candidate novel artifacts-ideas will likely undergo strong environmental pressures (e.g. market pressure, functional pressures, social pressures etc.) when being put at test in the real world, heavily biasing the search space. It seems therefore legitimate to wonder if strong inductive biases and environmental coupling are key to guide the emergence of abstract structured knowledge and innovation. Generating and retaining 'stepping stones to everywhere' seems indeed too ambitious in such a combinatorially exploding space of programs in the absence of any environmental pressure (e.g. with survival or functional constraints). Combining such quest with some inductive bias appears to be a more realistic reflection of how symbolic knowledge formation occurs in social and natural systems. Moreover, it holds the potential for yielding more reliable and consistent innovations. We investigate different ways to bias evolution and guide the formation of symbolic knowledge, through either:
As illustrated Figure 5, at the core of OED search for diversity-enabling programs –and at the core of open-endedness– lies a delicate balance between environmental, efficiency and novelty pressures.
These different pressures at play to select elites at each iteration may be conflictive.
Indeed, learning a language also implies learning a strong inductive bias towards the search space, which may impede diversity
Convergent Evolution
Strong efficiency pressures, either from environmental constraints (cf. (PHYSICS)), or via stronger language-pressure (cf. (BASE)), or conversely lower novelty pressure (as in the (A-N) run), seem to favor phenomenon of convergent evolution, i.e. the repeatability of certain evolutionary outcomes
Enhanced Exploration To mitigate the language bias, OED provide ways to loosen the efficiency pressure in favor of the novelty pressure, by promoting exploration in the generation step, notably through increasing the exploration ratio α weighting the sampling or the sampling temperature τ. Early experiments on this enhanced exploration confirm that more diverse and novel programs are generated. However, by reducing the presence of regularities among elites, excessive exploration can impede the abstraction phase and therefore the growing of the library. Further work is needed to automatically learn to adjust this balance.
To assess the benefit of the different components of our model in the open-ended creation of diverse programs, we run several ablation studies:
While preliminary, our results suggest that neurally-guided library learning, enabling both more efficient program discovery and more sustained diversity is a promising framing towards open-endedness. By building a library of programmatic abstractions —instead of solutions—, our method preserves regularities in a more potent and versatile way than other diversity-oriented approaches. As we have observed, OED has the potential to guide the development of innovation using innate priors, biased metrics, and environmental constraints. Accentuating efficiency pressures, such as language-bias and environmental factors, led to phenomenon of convergent evolution. Library learning entails a growing amount of bias, which may have a negative impact on diversity. To mitigate this effect, and foster creative divergence, it seems crucial to promote exploration and stochasticity, both during the generation and abstraction phases. However, by reducing the presence of regularities among elites, excessive exploration may impede the abstraction mechanism. Further investigation is necessary to determine the optimal balance between exploration and efficiency in library learning, which could be automatically adjusted depending on the task. The simplicity of the Tower Domain task do not allow OED to reach its full potential for open-ended creation. To unleash its full capabilities, and the benefit of novelty-oriented library learning model in downstream tasks, OED should be deployed in richer environments, at longer evolutionary scale. Another compelling development would be to use OED for the production of increasingly sophisticated behaviors, instead of simply artifacts.
Email: clgl 'at' itu.dk Twitter: @claire__aoi
This project was funded by a DFF-Research Project grant (9131 − 00042B)