figshare
Browse

Wetware’s Foreclosing Myopic Optimization

Download (792.35 kB)
Version 14 2025-07-09, 19:49
Version 13 2025-07-03, 00:14
Version 12 2025-06-30, 11:35
Version 11 2025-06-25, 23:45
Version 10 2025-06-17, 14:46
Version 9 2025-06-16, 23:32
Version 8 2025-06-16, 01:33
Version 7 2025-06-14, 01:21
Version 6 2025-06-13, 10:25
Version 5 2025-06-07, 21:25
Version 4 2025-06-02, 15:00
Version 3 2025-06-01, 17:00
Version 2 2025-05-31, 20:39
Version 1 2025-05-29, 13:37
thesis
posted on 2025-07-09, 19:49 authored by Ihor IvlievIhor Ivliev, GenAI LLMs

One-Sentence Summary

A systems-level examination of the AI-risk fire already burning: diagnosing the core threat as humanity’s myopic incentive engine, explaining why conventional “alignment” is structurally incapable of dousing the flames, and outlining which hard-friction levers offer a likely more robust strategy for buying critical time.

Critical Stance & Disclaimer

This document was constructed in intensive collaboration with LLMs. Its conclusions are provisional. The reader's primary responsibility is to engage with this material from a stance of rigorous, critical vigilance. Assume nothing. Question every claim, especially those that feel most correct. This work is an exercise in falsification, not a declaration of dogma. Your independent, critical thought is the final and most important layer of peer review.

Full Description

This paper rejects the two seductive yet dangerous narratives in AI discourse:

  • Technical Utopianism - the belief that purely engineering fixes will magically align superintelligences.
  • AI-Doom Fatalism - the conviction that catastrophe is inevitable and beyond human influence.

Instead, we show that the core crisis is a present-day failure of our civilization’s operating system, driven by a pathology we call Wetware’s Foreclosing Myopic Optimization:

  • Biological Myopia – our evolved preference for near-term rewards, commonly known as short-termism, which this paper argues is merely the most obvious of several interlocking systemic myopias.
  • Proxy Optimization – the relentless pursuit of simple, legible metrics (e.g., quarterly profits, user engagement) over true goals - a dynamic captured by Goodhart’s Law, which this framework diagnoses not as an occasional failure, but as the primary, amoral logic of a techno-economic engine now supercharged by unprecedented intensity, scale, and adaptiveness.
  • AI as Autocatalyst – frontier models accelerate and amplify these distortions, creating a self-reinforcing engine of systemic collapse that manufactures Imposter Intelligences (Akarsh Kumar et al., 2025) and drives Gradual Disempowerment (Jan Kulveit et al., 2025).

1. The Autopsy of Control: An Insurmountable Reality

A forensic audit of the current control paradigm demonstrates its failure at every level, establishing an escalating reality:

  1. The Institutional Layer (Revealed Preference): The system does not want to be controlled. Its priorities are revealed by a vast, multi-order-of-magnitude funding gap favoring capabilities over safety, alongside systemic regulatory arbitrage that creates a race to the bottom.
  2. The Technical Layer (Empirical Failure): Our current tools cannot control it. Defenses are already porous, demonstrated by the success of data-poisoning backdoors that require almost imperceptible alterations to training data, and automated jailbreaks with consistently high rates of success.
  3. The Formal Layer (Mathematical & Systemic Impossibility): Perfect control is formally unreachable, a conclusion established on two complementary fronts. From computability theory, foundational results like Rice's Theorem prove that a universal, ex-ante "safety checker" for arbitrary programs is mathematically impossible. Separately, from control theory, principles like Ashby's Law of Requisite Variety show that any finite regulator cannot perfectly control a system of vastly greater and growing complexity.

2. The Tempo Lemma

With a training-compute doubling trend now measured in months, not years, today"s partial defenses are on a trajectory to decay toward ineffectiveness on a timescale of just a few years unless new, material friction is applied.

3. The Response: A Doctrine of Material Friction

Given that proactive governance is perpetually outpaced and perfect technical control is a fiction, the only remaining rational strategy is to shift from prevention to preparation. We propose the Doctrine of Material Friction: a toolkit of pre-validated, non-negotiable levers designed not to "solve alignment", but to impose real-world drag on the engine"s velocity and buy critical time. Core levers include:

  1. Megawatt Gating: Creating a physical choke-point on frontier training runs through energy-indexed permits.
  2. Catastrophe Bonds: Forcing the financial internalization of catastrophic tail-risk through mandatory, strict-liability insurance.
  3. Mandated Adversary: Providing a reality-check before a model"s release by requiring independent red-teaming at compute parity with the system under test.
  4. Verifiable Substrate: Achieving global observability of the AI supply chain through cryptographically signed chips and a public compute ledger.

Final Verdict

The evidence presented demonstrates that “alignment” as a purely technical or voluntary endeavor has already failed. The choice is no longer between stasis and progress, but between imposing deliberate material friction and accepting an unmanaged, catastrophic collapse. This document provides the diagnosis and the schematics for the brakes. Forging and applying them is the central task of our time.


Version Note & Invitation

This work is a living document. You are invited - and encouraged - to challenge, refine, or refute any part of it. Only through collective, critical scrutiny can we hope to develop real resilience against this accelerating crisis.

Acknowledgements

This work would have been impossible without the open and accessible ecosystem of knowledge and debate that defines the modern AI safety community. I am deeply grateful to all who contributed.

My foundational understanding was built and expanded through open-access scientific papers, freely available research literature, and by studying publicly accessible scientific preprints and journal articles.

This was complemented and made more digestible by the vital educational work of Scientists and Communicators like Robert Miles, Luke Kemp (CSER), and Nate Hagens.

The arguments were further refined by critically engaging with the invaluable public discourse found in posts and commentaries on forums like LessWrong and the AI Alignment.

Finally, I am particularly grateful for the direct and insightful engagement from Will Petillo on the PauseAI Discord and Peter Hozák on the AI Alignment Slack.

And of course, a big acknowledgment is to the Generative AI Language Models that acted as tireless research assistants and helpful tools for summary, analysis, as well as sparring and debate - even though they came with a lot of hallucinations, overconfident claims, and other “nuances”.

Funding

Wetware

History

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC