Distributed data processing (DDP) is a data processing methodology where data processing capabilities are spread across a network. In distributed data processing, tasks, functions and processes can be distributed such that they are invoked, shared or resumed in parallel. Each node of a DDP can process some or most of the functions of a task. These nodes: servers or any other device that can process a task, are connected using a network which may be within the same region or within multiple regions.
Why is DDP used in Artificial Intelligence services?
Artificial Intelligence (AI) is a field which often uses algorithms which require heavy computational and storage power to process data. Many businesses which can afford such computational power often distribute AI processes in distributed nodes. By distributing in a DDP processes can be efficiently managed and executed with less performance problems while executing faster. By distributing data in multiple storages, stored data can be managed and analyzed easily.
Scalability is another benefit of using distributed systems in AI. Resources used in systems which are based on AI may need to scale depending on the usage. If a single node is used to deploy a system using AI, then scaling of tasks is slow and error prone. However, scaling is easy if tasks of a system are distributed, and they are linked and executed in parallel. Some systems can also scale automatically based on system configurations. Therefore, DDP improves system usage in AI by reducing issues with resource usage and ensuring availability of a system.
DDP also improves the fault-tolerance of a system. When tasks are replicated and if certain sections of a system fail then other sections can serve the process without stopping a system. Disaster recovery is made possible by using multiple availability zones or geographical regions to host a system. Each zone hosts a copy of the system and zones are linked using a network to switch between them in case of a system level or task level issue.
Issues in traditional DDP in AI
Security is one of the issues when using a distributed system. When a system is distributed multiple nodes need to be monitored for any unauthorized access and the network needs to be secured from attacks. Costs for security software, costs for monitoring and costs for security expertise are some of the downsides of managing a DDP-based AI system.
Another issue is managing the network of a DDP system. The network, with disaster recovery systems, need to be setup, monitored and troubleshooted for issues. Also, a business needs to manage its own costs when nodes need to be added, removed or troubleshooted for issues. Also, AI systems are also not systems which run on their own since several other functions such as those for alerting on issues, functions for background services, etc. need to be managed by the user.
Use of cloud for DDP
Many AI systems used today are deployed in the cloud due to above issues in user managed DDP systems. Cloud vendors manage the network, availability zones, resources and functions used in AI based systems. Therefore, costs for hosting a distributed AI system are low by using a cloud vendor compared to using a user hosted DDP. The requirements for disaster recovery, monitoring, and maintenance are also low as these are managed services of the cloud vendor.
Security of using a cloud to host systems is debated by several users. Most cloud vendors monitor and protect their networks from several network related attacks and attacks which are generated within their network. Also, managing the security of a cloud-based approach is debatable as most businesses use a hybrid cloud to host data and/or systems. However, cloud hosting has changed the traditional security of DDP in AI and enhanced use of AI with less effort for most businesses.
Image courtesy: https://www.zte.cn.com/