Monday, June 30, 2025

Zero-Knowledge Proofs: Verifying Computation and Preserving Privacy

Expanding Verifiability Beyond Merkle Trees in the Age of AI

From my previous post, you're already familiar with Merkle Trees as a powerful data structure for efficient and secure validation of contents. You know they achieve this by hashing data and then hashing those hashes up a tree, allowing for Merkle proofs that can verify data inclusion or consistency without revealing the entire dataset. This capability is becoming even more crucial as AI-generated content makes verifiable content more imperative. 

But what if you need to prove something more complex than just data inclusion? What if you need to prove a computation was performed correctly, or that you know a secret, without revealing that secret? This is where Zero-Knowledge Proofs (ZKPs) come into play, offering new dimensions of verifiability and privacy.

What are Zero-Knowledge Proofs?

A Zero-Knowledge Proof is a cryptographic protocol where a prover can convince a verifier that a statement is true, without revealing any information beyond the truth of the statement itself. Think of it like proving you're over 18 without showing your ID or revealing your name and address. ZKPs bring two main "primitives" or building blocks:

  • Computational Integrity (Succinctness): They allow you to create proofs of computations that are significantly easier and faster to verify than to perform the original computation. This means the proof itself remains small, regardless of how complex the computation being proven is. Just as a Merkle proof is small compared to the original data, a ZKP is small compared to the computation it verifies.
  • Zero-Knowledge (Privacy): They provide the option to hide parts of the computation (like sensitive inputs or even parts of the model) while still proving its correctness.

While generating ZK proofs can be very computationally intensive, advancements in cryptography, hardware, and distributed systems are making them feasible for increasingly complex computations. This expansion of capabilities opens up a vast "design space for new applications".

Programming ZKPs: A Shift in Mindset

Unlike traditional programming, which focuses on how to compute, programming ZKPs (often called circuits) focuses on defining a set of constraints. These constraints are mathematical rules that the computation must satisfy. For example, you might constrain that two secret numbers multiplied together equal a public number, without ever revealing the secret numbers.

The typical workflow for building a ZKP involves:
  1. Writing the circuit: Defining the constraints of your computation.
  2. Building the circuit: Compiling it into a binary form and WebAssembly.
  3. Trusted Setup: A crucial pre-processing step that generates a proving key (for the prover) and a verification key (for the verifier). 
  4. Generating the proof: Using your private inputs (the "witness"), the compiled circuit, and the proving key.
  5. Verifying the proof: Using the verification key, the public output, and the generated proof.

Concepts like hash functions are fundamental in ZKPs, just as they are in Merkle Trees. However, ZKPs often use "ZK-Friendly hash functions" like Poseidon, which are optimized for use within ZKP circuits, offering significant performance gains compared to traditional hashes like SHA-256 due to their arithmetic-based implementation. 

Commitments, a cryptographic primitive allowing you to "commit" to a secret value without revealing it, are also crucial, often built using these hash functions. These are key building blocks for applications like digital signatures and more advanced concepts like group signatures, where you can prove you are part of a group without revealing your specific identity.

Programming ZKPs: An Example

Lets walk through a basic example of proving that we know two numbers whose product is 36 without revealing what those numbers are.

Write the circuit using Circomc <== a * b; is the constraint, two numbers multiplied together equals a third number.

Circom compiles it into a Wasm file which we'll use to generate a witness for specifying our private inputs when creating the proof and a Rank 1 Constraint System binary file mathematically defining our single constraint.

Then we perform a trusted setup. The generated Common Reference String (CRS) consists of a proving key and a verification key. These keys can then be used every time we want to generate and verify proofs, respectively. They can be shared publicly and I provide mine here as part of the example:

Finally, we generate the proof using snarkjs with the Wasm file, proving key and private input that might be something like 9 and 4.

We get proof and public output JSON files.

We've proven that we know two secret values, a and b, whose product is 36. You can verify the proof (assuming you trust my verification key) with snarkjs.

$ snarkjs groth16 verify example1_verification_key.json public.json proof.json
[INFO]  snarkJS: OK!

If you change public.json to contain a different number the proof will no longer be valid. I've no longer proved I know the factors of this new number.

ZKPs and Blockchains, and Machine Learning (ZKML)

The convergence of ZKPs, blockchains (Web3), and machine learning is a rapidly advancing area with significant potential.

Blockchain use cases include:
  • Scaling Blockchains: Public blockchains have limited computational power. ZKPs enable computations to be executed off-chain, with only a small ZK proof verified on-chain. This scales blockchains without sacrificing decentralization or security. Examples include ZK rollups like Polygon zkEVM and zkSync.
  • Privacy-Preserving Applications: The zero-knowledge property is ideal for creating applications that protect users' privacy and personal data when making cryptographic attestations. Aztec Network, for instance, uses a ZK rollup for Ethereum where users' balances and transactions are completely hidden.
  • Identity Primitives and Data Provenance: Projects like WorldID use ZKPs for privacy-preserving proof-of-personhood protocols, allowing a person to prove they are a unique human without revealing their identity.

ZKML is about applying ZK proofs to machine learning models, specifically focusing on the inference step. The core motivations for ZKML include:

  • Verifying AI-Generated Content: With AI content becoming indistinguishable from human-created content, ZKPs can help determine that a particular piece of content was produced by applying a specific model to a given input.
  • Privacy-Preserving Inference: ZKPs allow you to apply an ML model to sensitive data, where a user can get the result of the model's inference without revealing their input to any third party.

While proving something as large as current LLMs with ZKPs is not currently feasible, there's significant progress on creating proofs for smaller models. Teams are actively working on improving ZK technology, including specialized hardware and proof system architectures, to allow proving bigger models on less powerful machines in less time.

Summary

While Merkle Trees excel at verifying data inclusion and consistency, ZKPs extend this idea to verifying computations and knowledge with the added benefit of privacy. This makes them incredibly powerful for building the next generation of scalable and private applications on blockchains, especially as AI-generated content and privacy concerns continue to grow. The future of verifiable content, whether data or computation, is increasingly intertwined with these advanced cryptographic proofs.

Sources

https://zkintro.com/articles/programming-zkps-from-zero-to-hero

https://world.org/blog/engineering/intro-to-zkml