Hyperreal Enterprises: Roadmap

Preface

This document synthesises a Roadmap (and perhaps also a Product Roadmap) for Hyperreal Enterprises, Ltd. The Roadmap is being written inside Org Roam (an Emacs package), and displayed with Firn. The document can be thought of as a Wiki (and edited inside Emacs). From these sources, other downstream formats can be derived. A narrative overview follows.

To skip it, browse ahead to Top.

Introduction

Applying AI to technical fields is a huge opportunity.

It’s striking that in computer programming work, collective intelligence is used almost everywhere, but artificial intelligence is used almost nowhere. AI can kick butt at Chess and Go, but university courses are still taught by human professors. Heck, code is still written by human programmers. Surely, AI that could write code at a human level would be a valuable thing?

So where’s the human-level AI for coding?

A research agenda around “AI for programming” was already spelled out by Alan Turing in the late 1940s. It is only relatively recently that we have massive amounts of relevant data ready to work with. Code completion and code generation tools are some of the “low hanging fruit” in this domain. However, we think that something much more substantial is around the corner. (Some evidence: we’re not the only people thinking about this kind of thing.)

So we are looking into it.

Our evolving cast of characters brings together applied experience in Natural Language Processing (NLP), computational modelling, online communities, mathematical sciences ($x!$) as well as humanities and social science. We’ve been getting together for daily coffee chats, and sharing information and skills with each other in online seminars. Sometimes we bring in guest speakers. We trying to structure things so that as time goes by our chats will coalesce into notes, blog posts, papers, and working prototypes. We aren’t promising a schedule of deliverables. We’re all volunteers, typically with other paying jobs, sometimes in the process of looking for them! We’re doing this for fun and interest. We plan to share what we’re learning as we go along!

Broadly, the steps we have in mind move from data, to models, to AI agents.

We plan to document our progress (or lack thereof) on this blog. As a very rough outline:

  • [ ] We’d like to use contemporary information extraction methods to derive computationally meaningful material from sources like Stack Exchange Q&A, Github Issues, and programmers’ discussions, along with code. To do this, we will combine general purpose language models, like ELECTRA, and a knowledge graph approach.

  • [ ] We’d like to use category theoretic methods as a glue that can hold together a range of computational models, including models of programs and the process of computer programming. Monocl is one such an existing process modelling language that has been used to create an abstraction layer over a collection of data science programs: we’d like to generalise this approach.

  • [ ] Ultimately, we plan to install the models in computational agents who can then “converse with each other to sharpen their wits” (as Turing anticipated), mirroring contemporary developments in self-play. However, the specific design of the agents remains an open issue at the moment! Here, we’d like to explore techniques from Bayesian learning, logic programming, and reinforcement learning.

The path forward may be much less linear than these three bullet points would suggest.

Reflection is a big part of the process.

We’re interested in understanding human behaviour as well as technical topics. That goes for our own behaviour in particular. We plan to keep this wiki up to date and use it as a place to reflect honestly on how things are going. Although we’re grappling with some weighty topics, ‘success’ in this domain is not all or nothing! Writing here will mark our progress and be useful in their own right (e.g., blog posts can potentially feed into research papers or tools, or at minimum help us manage our workflow).

Progress so far

Alongside setting up a blog and drafting an initial anouncement (voilà!), we’ve updated the website of our affiliated UK-based company Hyperreal Enterprises with an agenda that matches the outlines of we’ve described here. As mentioned above, are having regular meetings, which we may record if they look likely to be interesting to a wider audience. Other stuff:

  • We’re working on tools and processes that can improve our efficacy, for example, we gave a short talk at EmacsConf 2020 about some of the Emacs-based data-and-computation wrangling tools we’ve been working on.

  • Also following up on EmacsConf we’ve started an additional weekly seminar, the “Emacs Research Group”, which has been quite disciplined about taking notes, and developing a focused plan of work.

  • We submitted a grant proposal (with colleagues at Oxford Brookes) to the EPSRC mathematical sciences small grants scheme that would support half a year of work on the data-facing part of the pipeline.

TL;DR

We are working on creating AI tools that will process open source information and enable applications such as automated computer programming.