Distri.AI
English
English
  • ๐ŸชŸIntroduction
    • ๐ŸงฒWhat is Distri.AI
    • ๐Ÿ—ž๏ธWhitepaper
  • ๐Ÿ•น๏ธGetting Started
    • ๐Ÿ›’User
    • ๐Ÿ› ๏ธCompute Node
    • ๐ŸšฐFaucet
  • ๐Ÿ”ฑDistri.AI Aggregator
    • ๐Ÿ๏ธGPU Market
    • ๐Ÿ›ซModel Hub
    • ๐Ÿ“‘Dataset Repository
  • ๐Ÿ“ฆML Workspace
    • ๐ŸคนJupyter
    • ๐Ÿคนโ€โ™€๏ธDesktop GUI
    • ๐Ÿคนโ€โ™€๏ธVisual Studio Code
    • ๐Ÿคนโ€โ™‚๏ธJupyterLab
    • ๐ŸคนGit Integration
    • ๐Ÿคนโ€โ™€๏ธFile Sharing
    • ๐Ÿคนโ€โ™‚๏ธAccess Ports
    • ๐ŸคนTensorboard
    • ๐Ÿคนโ€โ™€๏ธExtensibility
    • ๐Ÿคนโ€โ™‚๏ธHardware Monitoring
    • ๐ŸคนSSH Access
    • ๐Ÿคนโ€โ™€๏ธRemote Development
    • ๐Ÿคนโ€โ™‚๏ธRun as a job
    • ๐Ÿ“ฌFAQ
  • ๐Ÿ“žContact & Social Media
Powered by GitBook
On this page
  • 0 Abstract
  • 1 Introduction
  • 2 Key Contributions
  • 3 Design
  • 4 Core Implementation
  • 5 Future Work
  • 6 Conclusion
  1. Introduction

Whitepaper

Last updated: Mar/13/2024

0 Abstract

A report from the International Data Corporation (IDC) reveals that the global AI computing market is projected to grow from $19.5 billion in 2022 to $34.66 billion in 2026, with a significant share attributed to the generative AI market. The emergence of generative AI, particularly with the advent of GPT-3, has catalyzed a transformation in the AI domain, accompanied by a notable surge in computing power requirements and costs. NVIDIA's near-monopoly in the AI GPU market has resulted in computing power becoming a new monopoly resource, while GPU leasing raises concerns regarding data privacy and security. In response to these challenges, Distri.AI is committed to establishing a distributed GPU computing network, a privacy-focused deep learning framework called PrivySphere, and a platform for model/data sharing. The aim is to reduce costs, enhance efficiency, safeguard data security, and propel advancements in AI technology.

[The above content was generated by ChatGPT-4.]

1 Introduction

1.1 Background

In the era of the industrial revolution, oil stood as the core energy driving global development, profoundly influencing every industry. With the advent of the AI era, computing power has become the "digital oil" globally, playing a crucial role in technological advancements. We have witnessed the corporate race for AI chips and the groundbreaking increase in the stock market value of Nvidia. These actions unequivocally indicate that computational power will be a key resource in the future era.

According to the Global Computing Power Index Assessment Report for 2022-2023, jointly released by the International Data Corporation (IDC), Inspur Information, and the Global Industry Research Institute of Tsinghua University, the global AI computing market is anticipated to experience significant growth. The report indicates that the global AI computing market size was $195.0 billion in 2022 and is projected to reach $346.6 billion by 2026. Within this, the generative AI computing market is expected to grow from $820 million in 2022 to $10.99 billion in 2026, with its proportion in the overall AI computing market increasing from 4.2% to 31.7%. This data underscores the rapid development and increasing significance of generative AI in the field of AI computing.

1.2 Motivation

Since the advent of GPT-3, generative artificial intelligence (AI) has sparked a transformative revolution in the field of AI, driven by its remarkable performance and widespread applications. This transformation has led numerous technology giants to engage in intense competition in AI research and development. However, this development has also brought forth a series of challenges, particularly in the realm of computing power requirements.

The training and operation of Large Language Models (LLMs) demand substantial computing power support. As these models undergo continuous iterations and upgrades, the demand for computing power and associated costs exhibit exponential growth. Taking GPT-2 and GPT-3 as examples, GPT-3's parameter count is 1166 times that of GPT-2, and the training cost for GPT-3, estimated based on the prevailing public GPU cloud pricing model, reaches a staggering $12 million, 200 times that of GPT-2. In practical applications, each user query involves inference calculations. Using data from early 2023 as an example, the chip demand for over 13 million independent users corresponds to more than 30,000 A100 GPUs, with an initial investment cost reaching $800 million and an estimated daily model inference cost of $700,000. The latest model, GPT-4, with 1.8 trillion parameters, incurs a single training cost exceeding $60 million, demanding an astonishing computing power of 2.15e25 FLOPS (21.5 quadrillion floating-point operations). The anticipated future models are expected to further escalate the demand for computing power.

Currently, the production of AI GPUs is nearly monopolized by NVIDIA, with prices reaching exorbitant levels, such as the latest H100 selling for up to $40,000 per unit. These GPUs are quickly snatched up by tech giants in Silicon Valley, partly for training their own new models and partly for renting out through cloud platforms to AI developers. In this scenario, computing power has become a new monopoly resource, making it challenging for many AI developers to acquire dedicated GPUs at regular prices, forcing them to resort to renting AWS or Microsoft cloud servers. These cloud service businesses boast high profit margins, with AWS's cloud service gross profit margin standing at 61%, and Microsoft's even higher at 72%.

On the other hand, the approach of leasing GPU devices comes with some privacy and security shortcomings. Firstly, when enterprises upload sensitive data to cloud platforms to leverage GPU resources, there is a risk of data privacy leakage stemming from security vulnerabilities within cloud service providers or external network attacks. Secondly, since public clouds operate in a multi-tenant environment, data and applications from different clients may share the same physical hardware, potentially leading to data cross-access and security isolation issues. Additionally, ensuring compliance with stringent data protection regulations in cloud environments, especially when handling cross-border data transfer and storage, poses a challenge for many organizations. Lastly, cloud platforms require robust access control and identity authentication mechanisms to safeguard data. Insufficient mechanisms may result in unauthorized access and data breaches.

Therefore, we are confronted with the question: Must we accept this centralized authority and control, paying exorbitant profits for computing power resources? Will the giants dominating the market in the Web2 era continue to monopolize in the new era?

1.3 Our Ideas

At Distri.AI, we are committed to building an intelligent, interconnected, and mutually beneficial computing infrastructure.

Distri.AI aggregates underutilized GPU resources globally, establishing the industry's first economically efficient distributed GPU computing network with a focus on privacy-preserving, positioning itself at the forefront of intelligent computing. Distri.AI also places a dedicated emphasis on developing a deep learning framework with privacy-preserving at its core. This ensures optimal protection for data and models during AI training, driving AI innovation in a secure and worry-free manner. Furthermore, Distri.AI's model/data sharing platform allows users to securely share AI training data and models, while ensuring the protection of data providers' interests.

Join Distri.AI and collectively shape the future of intelligent computing, exploring the boundless possibilities of technology.

2 Key Contributions

2.1 Distributed GPU Computing Network

Distri.AI is dedicated to aggregating idle GPU computational resources globally, creating a distributed GPU computing network. The core objective of this network is to provide more cost-effective and transparent computational services, enabling users to conveniently and efficiently utilize these resources. Specific modes of computational services include:

  1. DAO Community Task Deployment: Users can release computational tasks through a decentralized autonomous organization (DAO) community. This approach allows users to harness widely distributed computational resources for large-scale distributed training, reducing costs while enhancing the efficiency and scalability of training. This business model is particularly suitable for tasks involving large datasets and complex algorithms, such as deep learning and big data analytics.

  2. Direct Lease of Compute Nodes: Users also have the option to directly lease compute nodes to acquire the required computational resources. This direct leasing model offers more flexibility and immediacy, allowing users to quickly access computing power based on their needs without committing to long-term agreements or substantial upfront investments. This model is suitable for projects requiring short-term, powerful computational capabilities, such as research computations and short-term data processing tasks.

2.2 Privacy-Preserving Machine Learning Framework

We introduce an innovative machine learning framework called PrivySphere within the distributed GPU computing network, specifically designed for hierarchical privacy-preserving. The design philosophy of PrivySphere is to provide users with multi-tiered and flexible privacy-preserving options, ensuring the privacy and security of data and models during AI training. This framework defines three different levels of privacy-preserving mechanisms, ranging from low to high security needs: L1, L2, and L3. The specifics are as follows:

  1. PrivySphere-L1๏ผšL1-level privacy-preserving employs a mechanism based on secure containers. In this level, the computing environment is securely isolated to ensure that all operations within the container do not affect the external environment. The container image design is highly compatible, accommodating various computing requirements. This level is suitable for scenarios where basic privacy-preserving is required without the need for extreme security.

  2. PrivySphere-L2๏ผšL2-level enhances security by utilizing a privacy-preserving mechanism based on Trusted Execution Environments (TEE). This mechanism supports confidential computation, ensuring the security and verifiability of data during processing. It is suitable for scenarios requiring higher security, such as financial data analysis or processing of personal privacy data.

  3. PrivySphere-L3๏ผšL3-level represents the highest privacy-preserving level, employing a mechanism based on Secure Multi-Party Computation (SMC). This level features provable security and high scalability, supporting programs from various mainstream machine learning (ML) frameworks. L3-level is suitable for scenarios with extremely high requirements for data security.

2.3 Model/Data Sharing Platform

To advance the progress and application of artificial intelligence technology, we intend to build a model/data sharing platform supporting users in the paid utilization of models and data for AI training. The core objective of this platform is to create more economic value while ensuring that models and data provided by contributors are not maliciously leaked or abused. We will leverage the PrivySphere deep learning framework based on hierarchical privacy-preserving to achieve this goal.

3 Design

The overall architecture of Distri.AI is structured from the bottom up, comprising the Computing Layer, Privacy Preserving Layer, and Ecosystem Layer. In the following sections of this chapter, we will sequentially introduce the design principles for each of these layers.

3.1 Computing Layer: Distributed GPU Computing Network

3.1.1 Network Architecture

The Distributed GPU Computing Network is an open and efficient system composed of multiple computing nodes. This network is open to individuals and organizations with computing resources, allowing anyone to join as a computing node. Due to the flexible entry and exit of computing power in the network, providers can join as computing nodes based on the idle periods of their computational resources, contributing computational resources to earn profits.

This computing network integrates blockchain technology with a Peer-to-Peer (P2P) network architecture. When a computing node joins the network, it must first complete registration on the blockchain. The computing network does not have a central node; instead, it adopts a decentralized architecture, allowing all nodes to communicate directly with each other.

Computing nodes can contribute computational resources to the computing network in two ways:

  1. DAO-Based Task Repository: All computing tasks released through the DAO community are stored in a public task repository. These tasks vary in difficulty, and computing nodes can select suitable tasks based on their capabilities and preferences. This mechanism enables efficient utilization of distributed and extensive computational resources for large-scale training by those in need of computing power.

  2. Order-Based Computing Power Market: The network features a computing power market where supply and demand can engage in peer-to-peer transactions. Computing nodes can lease computing resources and conduct direct transactions with demand-side users, earning profits. This model enhances the flexibility and immediacy of computational resource transactions.

3.1.2 Components

The primary role of a computing node is to provide computing power, with core functionalities encompassing key technologies such as P2P network communication, blockchain network interaction, task processing, and container operations. P2P network communication ensures efficient communication between nodes, facilitating the transfer of computing tasks and data. Computing nodes must possess the capability to interact with the blockchain network for task scheduling and economic transactions. Nodes also need to handle assigned computing tasks efficiently and accurately. Additionally, they should support containerization technology, configuring diverse machine learning environments, and providing convenient use of computing resources for those in need through resource virtualization.

To expand its functionalities, if a computing node aims to offer deep learning services based on privacy-preserving, it can choose to integrate advanced privacy protection components from PrivySphere. For more detailed information and technical specifications regarding the privacy-preserving components of PrivySphere, please refer to Section 3.2 of the documentation, where in-depth technical analysis is provided.

3.1.3 Revenue Model

The revenue model of the distributed GPU computing network is primarily based on computing power, offering two modes for utilization and profit generation: 1) DAO-based task repository, suitable for researchers or open-source contributors, providing opportunities for free usage; 2) Order-based computing power market, suitable for users with higher requirements for computing services (such as stability, security, etc.), with prices influenced by transparent market mechanisms and competition, offering better cost-effectiveness. The core processes of these two modes are outlined below.

  1. DAO-based Task Repository

    1. Computing demand parties need to compile detailed and clear computing task proposals, including execution environment parameters, data retrieval links, and execution steps.

    2. DAO establishes different partitions based on the difficulty and type of computing tasks. Demand parties should submit proposals in the relevant partition for community evaluation.

    3. Computing demand parties can modify proposals based on community feedback until DAO deems them unsuitable and terminates the process.

    4. Upon approval, DAO publishes the task to the on-chain task repository, awaiting compute nodes to claim.

    5. Compute nodes can select tasks from the task repository based on their capabilities and preferences.

    6. Nodes must complete the computing tasks as required and submit the results.

    7. Upon successful validation, the node receives workload certification corresponding to the task difficulty.

    8. The task repository periodically issues incentives from the DAO treasury, with nodes earning profits based on their share of the overall network workload.

  2. Order-based Computing Power Market

    1. Compute nodes pledge and place orders in the computing power market, providing hardware and software parameters along with pricing.

    2. Computing demand parties choose and place orders to purchase computational resources in the market.

    3. During the ordering process, the leasing duration and prepayment are determined.

    4. Throughout the lease period, compute nodes provide their device resources exclusively to the user.

    5. Upon lease termination or early termination by the user, device resources are released.

    6. After settlement, compute nodes receive payment from the computing power market.

3.2 Privacy Preserving Layer: ML Framework Based on Hierarchical Privacy Preserving

3.2.1 Overall Architecture

In the distributed GPU computing network we have developed, we introduce an innovative ML framework called PrivySphere, specifically designed to provide hierarchical privacy-preserving. The core design philosophy of PrivySphere is to offer users multiple layers of flexible privacy-preserving options, ensuring comprehensive privacy and security for both data and models during AI training. To achieve this goal, PrivySphere defines three distinct levels of privacy-preserving mechanisms. These levels are arranged from low to high based on security requirements and are identified as L1, L2, and L3. Each level is designed to address different security and privacy needs, allowing users to choose the most suitable privacy-preserving level according to their specific requirements. This ensures that users can maximize the capabilities of the computing power network while safeguarding data security.

3.2.2 PrivySphere-L1

PrivySphere-L1 aims to provide a secure and flexible computing environment for scenarios with basic privacy-preserving requirements, primarily achieved through a secure container mechanism. At this level, the computing environment undergoes secure isolation, ensuring that operations within the container do not impact the external environment. Leveraging operating system-level isolation technologies such as Linux's Namespaces and Cgroups, it creates an isolated and independent runtime environment.

In terms of security controls, PrivySphere-L1 implements strict access control and data encryption to safeguard resource access and data security. The system integrates comprehensive monitoring and auditing features, including real-time logging, anomaly detection, and internal container activity logging, promptly identifying and addressing security threats to protect user data and model security.

PrivySphere-L1 also emphasizes the compatibility and maintainability of container images, supporting various computational requirements and applicable to different hardware and operating systems. Through image repositories and version management, the system ensures continuous updates and security of images. User-friendly interfaces and automated maintenance features simplify management, reducing operational pressure.

3.2.3 PrivySphere-L2

PrivySphere-L2 aims to provide advanced security and privacy-preserving, with a key feature being the use of a Trusted Execution Environment (TEE)-based mechanism. This requires compute nodes to be equipped with hardware that adheres to TEE standards to support security at the hardware level. TEE ensures the security of computational activities by creating a hardware-isolated execution environment, preventing external system access or modification. This design enhances security during data processing, increasing the trustworthiness and verifiability of the processing. For example, in the context of model training, the main process includes:

  1. Encrypted Transmission of Data and Models: Data and models are encrypted before being sent to the compute node, ensuring the security of the transmission process.

  2. Secure Loading and Execution of Data and Models: Encrypted data and models are securely transmitted to the TEE environment of the computing device. After decryption within the TEE, model training takes place. Due to the isolating nature of TEE, the training process is invisible to external systems, ensuring security.

  3. Secure Output of Results: After training is complete, results (such as model weights or analysis data) are re-encrypted before leaving the TEE, ensuring security even in non-trusted environments.

3.2.4 PrivySphere-L3

PrivySphere-L3 employs a mechanism based on Secure Multi-Party Computation (SMC) and is specifically designed for scenarios with the highest demands for data security. In this system, SMC allows multiple participants to collaboratively compute the result of a function without disclosing their respective input data. The key advantage of this approach lies in providing provable security, meaning that the privacy and security of data during processing can be guaranteed through mathematical and cryptographic principles. PrivySphere-L3 achieves maximum protection for user datasets and model privacy through innovative algorithm design and optimization of SMC.

Another notable feature of PrivySphere-L3 is its high scalability and support for various mainstream machine learning (ML) frameworks. PrivySphere-L3 has designed flexible interfaces and APIs, enabling seamless integration with a variety of popular ML frameworks such as TensorFlow, PyTorch, and others. This ensures that users can leverage advanced machine learning techniques for data analysis and model training while maintaining the privacy of their data.

3.3 Ecosystem Layer: Model/Data Sharing Platform

3.3.1 Platform Overview

A rich and comprehensive ecosystem cannot be complete without the support of model/data sharing, maximizing the creation of value. Distri.AI's Model/Data Sharing Platform enables users to engage in paid utilization of models and data for AI training. The benefits of this platform are as follows:

  1. Resource Sharing: The platform allows users to share models and data, maximizing resource utilization. Users can access and use models and data created by others, leveraging compute nodes for model training. Simultaneously, users can transition from being consumers to providers, sharing completed models with others.

  2. Cost Reduction: Users can utilize shared models and data by paying relatively small fees, avoiding the need to build large-scale datasets or undergo complex model training processes.

  3. Collaboration Opportunities: The platform provides collaboration opportunities for professionals from different domains. Researchers, engineers, and businesses can share their knowledge, skills, and resources, fostering broader collaboration and exchange.

  4. Diversity and Universality: The platform may host models and data from various fields and industries, providing users with diverse and widely applicable resources.

  5. Data Privacy and Security: Leveraging PrivySphere, the platform can establish appropriate usage and sharing agreements to ensure that models and data provided by contributors are not maliciously leaked or misused. This helps build user trust and encourages more participation in the sharing platform.

3.3.2 Security and Privacy Preserving

Ensuring security and privacy-preserving is paramount in the process of model/data sharing. PrivySphere provides comprehensive privacy and security protection for AI training, addressing both data and model aspects. The privacy-preserving process of Distri.AI's Model/Data Sharing Platform can be summarized as follows:

  1. Users (i.e., model/data consumers) search for the desired model or dataset on the platform.

  2. Upon selecting a model/dataset, users sign a user agreement with the provider.

  3. Model/data providers rent compute nodes supporting PrivySphere in the computing power market.

  4. Providers utilize these compute nodes to deliver computation results to users.

3.3.3 Community

The Model/Data Sharing Platform will establish an exclusive community for communication, aiming to emulate the successful patterns of platforms like Hugging Face and Kaggle, welcoming all users. This community will serve as a fertile ground for knowledge sharing and collaboration, fostering in-depth exchanges among users about experiences, skills, and best practices. In this interactive environment, users can not only freely explore and use various models and datasets but also provide evaluations and feedback directly on fee-based models or datasets from providers. Such an evaluation mechanism enhances community transparency and interactivity, encourages the identification and reward of high-quality contributions, further motivating contributors to pursue excellence. Additionally, this community will provide a platform for both beginners and experts to learn and grow together, collectively driving innovation and development in models and data.

4 Core Implementation

4.1 Privacy-Preserving Based on Secure Containers

4.1.1 Overview

PrivySphere-L1, based on Kata Containers technology, provides users with a virtual machine-level isolation environment, effectively preventing compute nodes from easily accessing user data as host machines, thereby ensuring data privacy. The main components involved in PrivySphere-L1 are as follows:

  • cri-o/containerd: cri-o implements the custom container runtime interface of Distri.AI and provides an integration path for OCI runtime. In Distri.AI, cri-o is the default container runtime.

  • containerd-shim-kata-v2: This is the Kata container runtime, distributing containers to the virtual machine. Hence, containers are "isolated" within the virtual machine. Communication between cri-o/containerd and the Kata container runtime is based on ttrpc.

  • Qemu/KVM: Kata containers support various virtualization technologies.

  • Kata Agent: kata-agent is a process running within the host machine, responsible for managing containers and processes running within them. The execution unit of kata-agent is a sandbox. The kata-agent sandbox is defined by a set of namespaces (NS, UTS, IPC, and PID). kata-runtime can run multiple containers within the same virtual machine to support container workloads requiring multiple containers to run within the same Pod. kata-agent communicates with the Kata container runtime (containerd-shim-kata-v2) using ttrpc via vsock.

4.1.2 Process

The following steps outline the process of creating secure containers in PrivySphere-L1:

  1. Users construct configuration files, with default templates containing comprehensive access control policies, data encryption, and comprehensive monitoring and auditing configurations.

  2. Users remotely request compute nodes to create corresponding secure containers.

  3. The PrivySphere-L1 container manager daemon on the compute node runs a single instance of the Kata runtime.

  4. The Kata runtime loads its configuration file.

  5. The container manager invokes a set of shimv2 API functions at runtime.

  6. The Kata runtime initiates the configured virtualization program.

  7. The virtualization program uses client assets to create and launch a virtual machine (VM).

  8. The agent starts as part of the VM startup process.

  9. The runtime calls the agent's CreateSandbox API to request the creation of a container.

  10. The container manager returns control of the container to the user.

4.2 Privacy-Preserving Based on Trusted Execution Environment

Confidential computing is a technique for performing computations without exposing the raw data. PrivySphere-L2 will introduce and apply confidential computing in the field of machine learning. This technology is particularly important for tasks involving sensitive data or privacy concerns. By securely transmitting raw data to the TEE of the compute nodes and performing training and inference of machine learning models within it, PrivySphere-L2 can protect the privacy of data while still providing accurate analysis and predictive results.

4.2.1 Unpartitioned Execution

Completing ML training/inference within the TEE is the most direct approach. In this scenario, the maximum capacity of ML tasks is strictly constrained by the storage and computational resources of the TEE. Early work has focused on implementing and testing on small-scale ML algorithms, where the complete inference or training process can be integrated into the TEE. However, fitting large ML models into the TEE becomes impractical when the secure memory required for training exceeds the capacity of the TEE.

4.2.2 Partitioned Execution

To avoid exceeding the maximum secure computing resources and memory swapping, PrivySphere-L2 utilizes effective partitioned execution to proactively optimize memory usage during ML processes.

A. Layer-based Partitioning

Applicable to models with hierarchical structures in general.

B. Feature Map-based Partitioning

For convolutional layers, such layers incur high memory costs. Protecting the first layer ensures the privacy of original information, while protecting the last layer ensures the privacy of information. Rolling layers within the TEE is suitable for ML inference, as inputs propagate forward throughout the entire layer and never return; its operation involves executing computations of the previous layer within the TEE and loading the next layer into the TEE. Rolling feature maps reduce real-time memory usage by applying General Matrix Multiplication (GEMM) and Image-to-Column (img2col) format transformation functions on partitioned sections and channels of feature maps. However, this disables parallelization capabilities, thus increasing computational time.

4.3 Privacy-Preserving Based on Secure Multi-Party Computation

PrivySphere-L3, based on secure multiparty computation for machine learning (MPC-ML), offers users a Python API, enabling them to seamlessly input machine learning programs (slightly modified to specify protected data). The compiler receives these machine learning programs and transforms them into a custom intermediate representation (IR) named Distri HLO (Distri High-Level Operations) as output. The Distri HLO intermediate representation preserves the structure and logic of machine learning algorithms while ensuring privacy and security.

The backend runtime of L3 is built on virtual devices across multiple connected compute nodes. These virtual devices receive Distri HLO and execute it as the implementation of MPC protocols between nodes, thereby completing secure machine learning training or prediction. This distributed computing architecture enables L3 to handle large-scale datasets and complex machine learning models while safeguarding the privacy and security of data. By providing a Python API and custom intermediate representation, L3 streamlines the process of utilizing secure multiparty computation technology for machine learning, making it more convenient and efficient.

4.3.1 Privacy Preserving Programming Interface

PrivySphere-L3 offers a machine learning Python API compatible with mainstream frameworks, enabling the implementation of PPML programs. Users simply need to specify the data and private functions to be protected, facilitating MPC. Once the protected objects are selected, the data is transformed into multiple secret shares through a secret sharing mechanism. Once all inputs are transformed into secret shares and appropriately distributed, participating compute nodes can commence collaborative computation. Throughout the computation process, each operation (such as addition, multiplication, etc.) has a corresponding secret shared version, ensuring that the execution of operations does not leak any information about the input shares. Participating compute nodes only handle and exchange encrypted or partitioned data shares throughout the computation process.

4.3.2 IR Generation and Optimization

Typically, the front-end of compilers converts the source code written using existing frameworks into hardware-independent IR, while the back-end of compilers further translates IR into hardware-specific machine code. With machine learning compilers, front-end frameworks only need to focus on generating IR, while back-end hardware vendors only need to focus on supporting IR instructions. The IR used in machine learning compilers is often represented as a computational graph (i.e., a directed acyclic graph). Graph nodes represent machine learning operations (such as matrix multiplication and convolution), with inputs and outputs being tensors (i.e., multi-dimensional arrays). Graph edges depict data dependencies between operations.

One widely used machine learning compiler is Google's XLA. XLA defines its IR as HLO (High-Level Operations) to represent computational graphs. A range of front-ends, including TensorFlow, PyTorch, and JAX, support XLA. Machine learning programs written in these frameworks can be compiled into HLO. After hardware-independent and hardware-specific optimizations, HLO is ultimately lowered by the XLA backend to machine code, running on CPU, GPU, or TPU.

We have designed Distri_HLO based on HLO as a customized IR for L3 because HLO lacks semantics related to MPC for optimization and efficient execution. In essence, Distri_HLO represents a computational graph consisting of a series of operations. The inputs and outputs of each operation are tensors. The tensor type system is the most significant difference between Distri_HLO and other machine learning counterparts. In Distri_HLO, the type of a tensor can be represented by a triplet <shape, data type, visibility>. The shape denotes the dimensions of the tensor. As for the data type, Distri_HLO currently supports boolean, integer, and fixed-point numbers. Visibility is a unique tensor attribute in Distri_HLO. It can be either secret or public. Secret implies that the tensor needs protection, and its actual value is invisible to any node in the L3 backend. In contrast, public means that the tensor doesn't require protection, and its value can be accessed by any backend node. This attribute is set by users in the programming interface as discussed in the preceding section.

Distri_HLO has undertaken the following efforts in compiling optimizations related to MPC:

  • Mixed-data-type multiplication fusion: In conventional machine learning computations, when multiplying an integer by a decimal, an intermediate conversion operation is typically invoked to convert the integer into a floating-point number. Subsequently, the multiplication operation can be dispatched to a floating-point multiplication core. As depicted, if we directly employ this graph in L3, an integer will first be converted into a fixed-point number, followed by fixed-point multiplication, necessitating truncation to maintain decimal precision. In practice, we can fuse these two operations into a single multiplication operation to reduce redundant truncation and conversion.

  • Mixed-visibility multiplication operands reorder: Multiplying a secret fixed-point number by two public fixed-point numbers involves two multiplication operations. Each operation generates a secret product requiring truncation, leading to higher communication overhead in certain MPC protocols. However, we can reorder the operands without affecting correctness. First, compute the multiplication of the two public fixed-point numbers. Since the product is also public, we can truncate the result via local shifting. Then, utilize the result to multiply the secret fixed-point number. By rearranging the multiplication operands, one expensive truncation can be saved.

5 Future Work

We will continue our journey towards building an intelligent, interconnected, and mutually beneficial computational infrastructure. In the near term, our exploratory domains will include, but are not limited to, the following:

  • Enhancing the capability to polish large-scale model training suited for heterogeneous wide area networks, aiming to boost the efficiency of large-scale model training via Distri.AI.

  • Integrating Zero-Knowledge Proofs (ZKP)/Zero-Knowledge Machine Learning (ZKML) technologies to achieve concise verification of model training and inference correctness.

  • Delving into how privacy computing can be combined with SecMLOps to make the AI application development process both straightforward and secure.

  • Creating a Web3 version of Hugging Face to unlock the value of ML creators and foster the construction of a universal AI innovation environment.

6 Conclusion

In summary, we have explored the vision and practice of Distri.AI, aimed at establishing an intelligent, interconnected, and mutually beneficial computational infrastructure. By aggregating underutilized GPU resources globally, Distri.AI has successfully created the industry's first distributed GPU computing network with a focus on privacy protection, securing its leadership in the realm of intelligent computing. Concentrating on developing a deep learning framework with privacy protection at its core, Distri.AI ensures the security of data and models during AI training, providing worry-free impetus for AI innovation. Moreover, the launch of its model/data sharing platform not only facilitates the secure sharing of AI training data and models but also guarantees the interests of data providers are fully protected. Joining Distri.AI means not just participating in a project, but becoming part of a grand journey towards technological advancement and societal progress.

PreviousWhat is Distri.AINextGetting Started

Last updated 1 year ago

Please refer to the for more details.

๐ŸชŸ
๐Ÿ—ž๏ธ
PDF version
Distri.AI Architecture
Distributed GPU Computing Network Architecture
Compute Node Components
Revenue Model
Framework Architecture
Model/Data Sharing Platform
Mixed-data-type multiplication fusion
Mixed-visibility multiplication operands reorder