Proposed way to perform metagenomics in computer media:

  1. Give identifiers to a very large sample of all executed code patterns that traverse processors;
  2. This would involve "binning"* executed code patterns "in the wild", in addition to instrumenting the source code of individual executables with traceroute and debug routines;
  3. Executed code patterns would span from i) small units of machine language that are highly repeated to ii) large units comprising entire executables;
  4. The schema for these identifiers would likely follow a Shannon function in which smaller identifiers are assigned to more common units;
  5. Integrate these experimentally determined identifiers with identifiers that developers instrument into their source code;
  6. Give identifiers to functions performed by sub-components of the code patterns identified in step 1;
  7. The identifiers would span from small functions to functions performed by entire executables;
  8. The identifiers would likely be assigned according to a Shannon function;
  9. Over time, assign human-readable names to the identifiers;
  10. Observe whether/how the code patterns change over time;
  11. Develop and test hypothesis regarding how the code patterns interact according to functions performed by their sub-components;
  12. Observe whether any of these code patterns coalesce over time, much as amino acid networks coalesced into RNA, DNA, and cellular life on early Earth;
  13. Distinguish code patterns that are created by people from code patterns that are created by other code patterns;
  14. Develop and test hypothesis regarding a minimum set of code patterns/functions that result in more of the code patterns being reproduced.

Setup Non-Profit Foundation

Establish grant guidelines

Receive contributions

Public outreach, RFP solicitation

Award grants

Monitor grant recipients

Publish results

Likely areas include:

  • Map metagenomics to computer science
  • Data Collection
  • Group identification
  • Function identification