AWS Database Blog
Use a DAO to govern LLM training data, Part 2: The smart contract
In Part 1 of this series, we introduced the concept of using a decentralized autonomous organization (DAO) to govern the lifecycle of an AI model, specifically focusing on the ingestion of training data. We outlined the overall architecture of our solution, which combines blockchain and generative AI technologies on AWS. We demonstrated how to set up and use a large language model (LLM) knowledge base with Amazon Bedrock, highlighting the steps to create an Amazon Simple Storage Service (Amazon S3) bucket, establish a knowledge base, and synchronize it with Ethereum Improvement Proposals (EIPs). This setup forms the foundation for our innovative approach to AI governance, setting the stage for implementing the smart contract in this part.
In this post, we focus on the writing and deployment of the Ethereum smart contract that contains the outcome of the DAO decisions.
Solution overview
The following diagram illustrates our Ethereum smart contract.
The smart contract stores a mapping between an externally owned account (EOA) and an IPFS CID. The EOA has to sign a transaction to initiate the upload of the training data from InterPlanetary File System (IPFS) up to the AWS infrastructure hosting the knowledge base.
The advantage of storing the training data on IPFS is that it can be uniquely referenced using a content-based identifier and quickly shared on the IPFS network (for a peer review, for example). This also minimizes hard dependencies on the AWS infrastructure, because both the blockchain and IPFS nodes can be executed elsewhere. For a complete guide on running an IPFS cluster on AWS, refer to the IPFS on AWS series.
In the following sections, you use Remix to create, deploy, and test a smart contract. Remix is an open source web based integrated development environment (IDE) that you can use directly from your browser.
Prerequisites
Review the prerequisites outlined in Part 1 of this series and complete the steps in the previous post to set up the solution components.
Set up MetaMask
If MetaMask is not already installed in your browser, follow the instructions in Getting started with MetaMask. On the network selection dropdown menu in the left pane, choose Sepolia (a testnet for Ethereum). MetaMask already created an account for you. To get started, you need to fund it before you can submit transactions to the network. Use one of the Sepolia faucets listed on the Ethereum.org networks page to request some ethers (ETH).
Write and deploy a smart contract in Remix
Let’s develop and deploy our smart contract. For simplicity, we use the OpenZeppelin Ownable module. Complete the following steps:
- Download the following contract on your computer.
- Open Remix, go to the contracts folder, and choose Open a file from your File System from the navigation pane to upload the
crypto_ai.sol
file.
You can inspect its content in Remix.
Note the following about the solidity code:- The contract uses the OpenZeppelin Ownable module to control the contract functions execution rights.
- The different elements composing the training set are stored in the
training_set
mapping, which maps addresses to strings representing IPFS CIDs. - The
get_from_training_set
function is public so it can be called from outside the smart contract, letting anyone consult the mapping contained in the smart contract by calling this function. - The
add_to_training_set
function that updates the mapping can only be called by the owner of the contract, which by default corresponds to the contract creator. In a more advanced setup, the ownership of this contract could be transferred to a smart contract that is part of a DAO. Such a contract could be implemented using the Governor contracts from OpenZepplin, which is beyond the scope of this post.
- In the navigation pane, choose the compiler icon.
- Choose a compiler version greater than 0.8 and check that the contract compiles without a warning.
- Choose the deployment icon in the navigation pane and change the environment to Injected Provider – MetaMask (instead of the default Remix VM).
You might be asked by MetaMask to confirm the connection. - Choose Deploy to deploy the contract.
- Choose Confirm to confirm the transaction.
After the transaction is processed by the network, you will see the details of the transaction on the bottom right of the Remix page.
The contract is now successfully deployed. Record its address.
You can expand the newly deployed contract in the left pane and check which functions are available.
Before interacting with the add_to_training_set
function, you will generate a new IPFS CID for additional training data. You may also want to pin the contract in the Remix interface.
Create a new IPFS CID
In Part 1 of our series, you built a knowledge base on Ethereum EIPs. You can enrich this knowledge base with the content of the Ethereum yellowpaper. To record the content of this document on IPFS, you could deploy your own IPFS nodes, as described in the IPFS on AWS series. In this post, you simply use an IPFS pinning service such as a Filebase free tier account (feel free to use another pinning service of your choice).
- Download the content of the Paris Upgrade Specification webpage to your local machine:
- Upload
ethereum-yellowpaper.pdf
to the IPFS pinning service. - Record the IPFS CID of the file.
If you’re using Filebase, your interface should look like the following screenshot.
Add a new training data reference to the smart contract
Complete the following steps to add a new training data reference:
- In the Remix interface, expand the
add_to_training_set
function
- For
approved_submitter
, use the same address as the one that deployed the contract.
You can use another one, but if you do so, you will need to use the same address while signing the transaction in Part 4. - For
ipfs_cid
, enter the IPFS CID generated in the previous step. - Choose transact and confirm when prompted.
After the transaction is processed by the network, you will see the details of the transaction on the bottom right of the Remix page.
Clean up
You can keep the components that you built, because you reuse them in the rest of the series. Alternatively, you can follow the cleanup instructions in Part 4 to delete them.
Conclusion
In Part 2 of this four-part series, we demonstrated how to deploy a smart contract using Remix and MetaMask on the Sepolia testnet. We also recorded an entry in the smart contract, referencing an IPFS CID.
In Part 3, we show how to create Lambda functions to upload the content of the IPFS CID to Amazon S3, and automatically run a knowledge base ingestion job whenever new data is added to the S3 bucket.
About the Authors
Guillaume Goutaudier is a Sr Enterprise Architect at AWS. He helps companies build strategic technical partnerships with AWS. He is also passionate about blockchain technologies, and is a member of the Technical Field Community for blockchain.
Shankar Subramaniam is a Sr Enterprise Architect in the AWS Partner Organization aligned with Strategic Partnership Collaboration and Governance (SPCG) engagements. He is a member of the Technical Field Community for Artificial Intelligence and Machine Learning.