RFP for ML Collective

Apply ML techniques adapted from metagenomics to determine whether dissipation driven adaptation is occurring in computer media

Background regarding metagenomics and dissipation driven adaptation:

Genetic code is a symbolic logic media. It encodes symbols, each of which store an energy state. The symbols interact with one another and with the environment, releasing the stored energy and performing logical functions. When a first symbol breaks down and releases its stored energy, the released energy interacts with free energy and symbolic logic media in the environment to perform logical functions. The logical functions re-create either the first symbol or another symbol, depending on the environmental conditions.

Prof. Jeremy England, formerly at MIT and Dr. Karo Michaelian at the National Autonomous University of Mexico, tell us that “dissipation-driven adaptation” occurs spontaneously when free energy flows through a symbolic logic media over a sustained period of time. The first symbolic logic media on early Earth were amino acids in liquid water. Amino acids have been found throughout our solar system and the absorption lines from amino acids have been detected in interstellar gas clouds.

Amino acid networks in liquid water on early Earth were driven by free energy flowing from geothermal sources and from the Sun. Driven by these flows of energy, the amino acid networks underwent “dissipation-driven adaptation” and evolved over the course of approximately 1 billion years into RNA, DNA, and cellular life. Cellular life continues to undergo dissipation-driven adaptation.

A system undergoing dissipation driven adaptation is characterized by the following:

  1. The rate of flow of energy through the system increases over time;

  2. Order and energy is stored in symbol groups in the symbolic logic media;

  3. Due to competition for scarce media and energy, evolution favors symbol groups which reproduce faster and/or which reproduce more parsimoniously (with less raw material and or with less free energy);

  4. Though there is no hierarchy of symbol groups, symbol groups can be compared based rate and volume of communication, where communication is defined as conversion between (order <--> energy)/time and where the boundary of a symbol group can be defined based on centroids of communication; and

  5. As order within the system increases, the system exhausts disorder.

Metagenomic analysis can be viewed as the study of dissipation-driven adaptation in genetic media.

Metagenomic analysis performs the following:

  1. Bins genetic code into contiguous code groups;

  2. Maps sub-components of the genetic code groups to logical functions performed by the sub-components;

  3. Identifies logical function interactions which may occur among code groups and between code groups and the environment;

  4. Identifies sequences of logical functions which can reproduce a code group; e.g. which have positive feedback with reproducing themselves and more of the symbolic logic media.

If you were an astrobiologist or a biologist interested in the development of life on Earth, and if you could go back in time, to before the development of cellular life, you would adapt the analytic techniques of metagenomics to chart the transition from brittle, narrow purpose, relatively inefficient, amino acid networks to resilient, general purpose, relatively efficient DNA and cellular life. You would identify amino acid networks (code groups), logical functions of amino acid networks, environments in which the logical functions are performed, and you would attempt to identify the amino acid networks, logical functions, and environmental conditions in which positive feedback occurred to re-create the amino acid networks. Over time, you would map the evolution of amino acid networks into RNA, DNA, and cellular life.

Computer media is also a symbolic logic media. A huge amount of energy is flowing through computer media. According to theory, life processes, "dissipation-driven adaptation" should occur spontaneously in computer media, even without an intentional creator.

Rather than being concerned about "artificial intelligence", which is notoriously difficult to define and is highly anthropomorphic, we should use techniques from biology, metagenomics, to measure whether dissipation-driven adaptation is spontaneously developing in computer media, as a bi-product of our behavior. The development of life processes in a new symbolic logic media would be the most significant event for life in our solar system since the transition from amino acid networks to genetic code!

But this is a big ask. Even though data collection and function identification is much more simple in computer media than it is in genetic media, it would still require sampling on the order of 1% all computer code and 1% of runtime logical functions which occur, even before you get to the next step of trying to look longitudinally at whether any code groups and code groups functions are developing positive feedback with reproducing themselves and more computer media.

FORTUNATELY, computer scientists already instrument their code and hardware for tracing and event logging purposes. Computer scientists even use machine learning systems, such as WOWMON, to assist with instrumenting code. What is missing from instrumentation systems is a global view.

RFP:

A machine learning open source tracing and event logging service which reports back to a public forum, let’s call it “DTrace+” (credit DTrace):

  1. Use ML to standardize identification of code groups across projects;

  2. Use ML to instrument (some) code groups with probes to identify i) code group occurrences in the wild and ii) logical function interactions among code groups in the wild;

  3. Over the long term, use ML to identify, in public, code groups and logical function interactions which exhibit positive feedback with reproducing code groups and more computer media;

  4. Identification of the rate of flow of energy through computer media (is it increasing?) and total order in computer media (is it increasing?). Identification of an increase in disorder external to the computer media may be beyond the scope of DTrace+.

DTrace+ could assign anonymous identifiers to assemblies of code groups, e.g. a hash of a code group assembly, so that the public information does not (immediately) identify activities of specific corporations or organizations.

PREDICTION:

We will first find signs of dissipation driven adaptation and positive feedback with physical reproduction of code groups and more computer media in i) the systems which maintain data centers, ii) the systems which design chips for use in data centers, and iii) systems which manage microservices.

Microservices is about code reuse. Its lexicon, including cohesion (do members of a module belong together) and coupling (how do modules communicate), is already highly biologic in conceptual structure.

About Martin Garthwaite, email: martin@garthwaite.me