Proposal to perform metagenomics in computer media:
Using this software, give identifiers to a very large sample of all executed code patterns that traverse processors;
This would involve "binning"* executed code patterns "in the wild", in addition to instrumenting the source code of individual executables with traceroute and debug routines;
Executed code patterns would span from i) small units of machine language that are highly repeated to ii) large units comprising entire executables;
The schema for these identifiers would likely follow a Shannon function in which smaller identifiers are assigned to more common units;
Integrate these experimentally determined identifiers with identifiers that developers instrument into their source code;
Give identifiers to functions performed by sub-components of the code patterns identified in step 1;
The identifiers would span from small functions to functions performed by entire executables;
The identifiers would likely be assigned according to a Shannon function;
Over time, assign human-readable names to the identifiers;
Observe whether/how the code patterns change over time;
Develop and test hypothesis regarding how the code patterns interact according to functions performed by their sub-components;
Observe whether any of these code patterns coalesce over time, much as amino acid networks coalesced into RNA, DNA, and cellular life on early Earth;
Distinguish code patterns that are created by people from code patterns that are created by other code patterns;
Develop and test hypothesis regarding a minimum set of code patterns/functions that result in more of the code patterns being reproduced.