building a vm inside chatgpt previews jailbreak design
see also: LLMs · Model Behavior
An experiment showed how to build a virtual machine inside ChatGPT using prompts (source). The project illustrates how users can create structured constraints that both guide and jailbreak model behavior. I read it as a preview of the next wave of prompt engineering.
causal chain
Structured prompt → simulated VM → constrained actions, which matters because constraints become a tool for steering outputs. Constrained actions → predictable behavior → more powerful workflows, which increase user dependence on prompt scaffolding. Prompt scaffolding → safety bypass potential, which forces guardrails to evolve.
risk surface
- Safety boundaries can be bypassed with clever scaffolding.
- Users rely on fragile prompt constructs that break across model updates.
- Teams ship systems that are hard to audit because logic lives in prompts, not code.
decision boundary
If model-native tooling can expose interpreters or safe execution layers, I will treat prompt-based VMs as a curiosity rather than a core pattern. Until then, they are a practical workaround that will keep showing up in the wild.
my take
This is clever, but it signals a deeper gap: people want programmable behavior, not just chat. We should build that directly.
linkage
- tags
- #ai
- #security
- #research
- #2022
- related
- [[Building a VM Inside ChatGPT]]
- [[chatgpt launch proves conversational ai is ready for consumers]]
- [[gpt-3 release redefines ai api calculus]]
ending questions
What is the minimum structure needed to make AI outputs auditable and repeatable?