distributed and democratized learning: philosophy and

13
PREPRINT 1 Distributed and Democratized Learning: Philosophy and Research Challenges Minh N. H. Nguyen, Shashi Raj Pandey, Kyi Thar, Nguyen H. Tran, Senior Member, IEEE, Mingzhe Chen, Member, IEEE, Walid Saad, Fellow, IEEE, and Choong Seon Hong, Senior Member, IEEE. Abstract—Due to the availability of huge amounts of data and processing abilities, current artificial intelligence (AI) systems are effective in solving complex tasks. However, despite the success of AI in different areas, the problem of designing AI systems that can truly mimic human cognitive capabilities such as artificial general intelligence, remains largely open. Consequently, many emerging cross-device AI applications will require a transition from traditional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform multiple complex learning tasks. In this paper, we propose a novel design philosophy called democratized learning (Dem-AI) whose goal is to build large-scale distributed learning systems that rely on the self-organization of distributed learning agents that are well- connected, but limited in learning capabilities. Correspondingly, inspired by the societal groups of humans, the specialized groups of learning agents in the proposed Dem-AI system are self- organized in a hierarchical structure to collectively perform learning tasks more efficiently. As such, the Dem-AI learning system can evolve and regulate itself based on the underlying duality of two processes which we call specialized and generalized processes. In this regard, we present a reference design as a guideline to realize future Dem-AI systems, inspired by various interdisciplinary fields. Accordingly, we introduce four underly- ing mechanisms in the design such as plasticity-stability transition mechanism, self-organizing hierarchical structuring, specialized learning, and generalization. Finally, we establish possible ex- tensions and new challenges for the existing learning approaches to provide better scalable, flexible, and more powerful learning systems with the new setting of Dem-AI. Index Terms—Democratized Learning, distributed learning, self-organization, hierarchical structure. I. I NTRODUCTION The growing success of AI in real-life applications has pro- liferated its usage. AI has provided a plethora of solutions for complex problems across multiple fields such as decision sup- port systems in healthcare, automation in retail and industries, advanced control and operations, and telecommunications, M. N. H. Nguyen , S. R. Pandey, K. Thar, and C. S. Hong are with the Department of Computer Science and Engineering, Kyung Hee Uni- versity, Yongin-si 17104, South Korea. Email: {minhnhn, shashiraj, kyithar, cshong}@khu.ac.kr. N. H. Tran is with School of Computer Science, The University of Sydney, Sydney, NSW 2006, Australia. Email: [email protected]. M. Chen is with Department of Electrical Engineering, Princeton University, Princeton, NJ, 08544, USA. Email: [email protected]. W. Saad is with the Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA. Email: [email protected]. Corresponding Author: Choong Seon Hong ([email protected]) among others. Correspondingly, numerous research activities in machine learning technologies [1]–[8] focused on architec- tures and algorithm designs that empowered the emergence of cross-device AI applications in our daily lives. However, in practice, the performance efficiency and re-usability of trained AI systems are quite limited, particularly when seeking to solve multiple complex learning tasks and when dealing with unseen data, due to their rigid design and learning settings. To address these issues, the recently proposed meta- learning framework (MLF) [9] provides capabilities that allow generalization from large training of similar tasks. Hence, MLF is able to quickly adapt to similar new tasks using only a small number of training samples. Meanwhile, the so-called multi-task learning (MTL) frameworks introduced in [10] and [11] allow training a general model for multiple small number of tasks; however, it requires significant similarity among those tasks. Therefore, there is an imminent need for rethinking existing machine learning systems and transforming them into systems that can control the generalization ability (i.e., good performance on unseen data of a single/multiple tasks) together with specialization ability (i.e., good performance on the learning tasks). A. Towards a Large-scale Distributed Learning System AI is moving towards edge devices with the availability of massively distributed data sources and the increase in computing power for handheld and wireless devices such as smartphones or self-driving cars. This has generated a growing interest to develop large-scale distributed machine learning paradigms [11]. In this regard, the edge computing paradigm provides the underlying infrastructure that empowers regional learning or device learning at the network’s edge. However, traditional learning approaches cannot be readily applied to a large-scale distributed learning system. One promising approach to build a large-scale distributed learning system is through the use of the emerging federated learning (FL) framework [2]. In FL, on-device learning agents collab- oratively train a global learning model without sharing their local datasets. The global model at the central server allows the local model to improve the learning performance of each agent; however, iteratively updating the global model based on the aggregation of local models can also have a negative impact on the personalized performance [2]. For example, in a supervised FL setting, the local model is optimized to fit the local dataset, whereas the global model is built on the simple arXiv:2003.09301v2 [cs.AI] 14 Oct 2020

Upload: others

Post on 24-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed and Democratized Learning: Philosophy and

PREPRINT 1

Distributed and Democratized Learning: Philosophyand Research Challenges

Minh N. H. Nguyen, Shashi Raj Pandey, Kyi Thar,Nguyen H. Tran, Senior Member, IEEE, Mingzhe Chen, Member, IEEE,

Walid Saad, Fellow, IEEE, and Choong Seon Hong, Senior Member, IEEE.

Abstract—Due to the availability of huge amounts of data andprocessing abilities, current artificial intelligence (AI) systems areeffective in solving complex tasks. However, despite the successof AI in different areas, the problem of designing AI systems thatcan truly mimic human cognitive capabilities such as artificialgeneral intelligence, remains largely open. Consequently, manyemerging cross-device AI applications will require a transitionfrom traditional centralized learning systems towards large-scaledistributed AI systems that can collaboratively perform multiplecomplex learning tasks. In this paper, we propose a novel designphilosophy called democratized learning (Dem-AI) whose goal isto build large-scale distributed learning systems that rely on theself-organization of distributed learning agents that are well-connected, but limited in learning capabilities. Correspondingly,inspired by the societal groups of humans, the specialized groupsof learning agents in the proposed Dem-AI system are self-organized in a hierarchical structure to collectively performlearning tasks more efficiently. As such, the Dem-AI learningsystem can evolve and regulate itself based on the underlyingduality of two processes which we call specialized and generalizedprocesses. In this regard, we present a reference design as aguideline to realize future Dem-AI systems, inspired by variousinterdisciplinary fields. Accordingly, we introduce four underly-ing mechanisms in the design such as plasticity-stability transitionmechanism, self-organizing hierarchical structuring, specializedlearning, and generalization. Finally, we establish possible ex-tensions and new challenges for the existing learning approachesto provide better scalable, flexible, and more powerful learningsystems with the new setting of Dem-AI.

Index Terms—Democratized Learning, distributed learning,self-organization, hierarchical structure.

I. INTRODUCTION

The growing success of AI in real-life applications has pro-liferated its usage. AI has provided a plethora of solutions forcomplex problems across multiple fields such as decision sup-port systems in healthcare, automation in retail and industries,advanced control and operations, and telecommunications,

M. N. H. Nguyen , S. R. Pandey, K. Thar, and C. S. Hong are withthe Department of Computer Science and Engineering, Kyung Hee Uni-versity, Yongin-si 17104, South Korea. Email: minhnhn, shashiraj, kyithar,[email protected].

N. H. Tran is with School of Computer Science, The University of Sydney,Sydney, NSW 2006, Australia. Email: [email protected].

M. Chen is with Department of Electrical Engineering, Princeton University,Princeton, NJ, 08544, USA. Email: [email protected].

W. Saad is with the Wireless@VT, Bradley Department of Electrical andComputer Engineering, Virginia Tech, Blacksburg, VA, 24060, USA. Email:[email protected].

Corresponding Author: Choong Seon Hong ([email protected])

among others. Correspondingly, numerous research activitiesin machine learning technologies [1]–[8] focused on architec-tures and algorithm designs that empowered the emergenceof cross-device AI applications in our daily lives. However,in practice, the performance efficiency and re-usability oftrained AI systems are quite limited, particularly when seekingto solve multiple complex learning tasks and when dealingwith unseen data, due to their rigid design and learningsettings. To address these issues, the recently proposed meta-learning framework (MLF) [9] provides capabilities that allowgeneralization from large training of similar tasks. Hence,MLF is able to quickly adapt to similar new tasks using onlya small number of training samples. Meanwhile, the so-calledmulti-task learning (MTL) frameworks introduced in [10] and[11] allow training a general model for multiple small numberof tasks; however, it requires significant similarity among thosetasks. Therefore, there is an imminent need for rethinkingexisting machine learning systems and transforming theminto systems that can control the generalization ability (i.e.,good performance on unseen data of a single/multiple tasks)together with specialization ability (i.e., good performance onthe learning tasks).

A. Towards a Large-scale Distributed Learning System

AI is moving towards edge devices with the availabilityof massively distributed data sources and the increase incomputing power for handheld and wireless devices suchas smartphones or self-driving cars. This has generated agrowing interest to develop large-scale distributed machinelearning paradigms [11]. In this regard, the edge computingparadigm provides the underlying infrastructure that empowersregional learning or device learning at the network’s edge.However, traditional learning approaches cannot be readilyapplied to a large-scale distributed learning system. Onepromising approach to build a large-scale distributed learningsystem is through the use of the emerging federated learning(FL) framework [2]. In FL, on-device learning agents collab-oratively train a global learning model without sharing theirlocal datasets. The global model at the central server allowsthe local model to improve the learning performance of eachagent; however, iteratively updating the global model basedon the aggregation of local models can also have a negativeimpact on the personalized performance [2]. For example, ina supervised FL setting, the local model is optimized to fit thelocal dataset, whereas the global model is built on the simple

arX

iv:2

003.

0930

1v2

[cs

.AI]

14

Oct

202

0

Page 2: Distributed and Democratized Learning: Philosophy and

PREPRINT 2

Fig. 1: Analogy of a hierarchical distributed learning system.

aggregation of local learning parameters (e.g., FedAvg [12],FedProx [13]) so as to perform well on the distributed dataset.In practice, local datasets collected by each agent are unbal-anced, statistically heterogeneous, and exhibit non-i.i.d (non-independent and non-identically distributed) characteristics.Thus, the global model of FL can become biased and stronglyaffected by the agents who have more data samples or bythose who perform larger update steps during the aggregationof local model parameters. Consequently, beyond a certainthreshold value of training rounds, the generalized globalmodel can negatively affect the personalized performance ofseveral learning agents [2]. Hence, the conventional FL cannotefficiently handle the underlying cohesive relation between thegeneralization and personalization (or specialization) abilitiesof the learning model in the testing and validation phase[2]. This raises an important, fundamental research question:How can one resolve the discrepancies between global andpersonalized accuracy? To answer this question and overcomethe aforementioned limitations of existing FL frameworks,we seek to develop a novel design philosophy which canbe widely used for future large-scale distributed learningsystems. To the best of our knowledge, the work in [14]was the first attempt to study and improve the personalizedperformance of FL using a so-called personalized federatedaveraging (Per-FedAvg) algorithm based on MLF. However,in [14] the cohesive relation between the generalization andpersonalization were not adequately analyzed. Recent workin [15] developed an analysis for the personalization in FLapplications using three approaches, such as hypothesis-basedclustering, data interpolation, and model interpolation.

B. Lifelong Learning and Formation of a Hierarchical Struc-ture

Inspired from the lifelong learning capability of biologicalintelligence and systems [16], we observe that both general-ization and specialization capabilities are involved in buildinga large-scale distributed learning system. As is the case inany form of biological intelligence, we observe a continualdevelopmental process: from stem cells to complex structureswith multiple functionalities, such as the human brain. Forexample, in humans, the learning process consists of many

stages such as newly born, childhood, and grown-up. A newlyborn stage characterizes the generalization capability witha high level of neurosynaptic plasticity [16] in a humanbrain. The synaptic plasticity level is intrinsically involvedto consolidate knowledge for learning and adapting to thedynamic environment. In this learning stage, an individualcan vastly learn basic functions/skills and abilities under theinfluence of social adaptation and education, or by leveragingcuriosity, implicit or explicit rewards [17]. The transition fromthe newly born stage to the childhood stage is characterized bythe individual’s pursuit to have a specialization capability overa set of already known basic skills and to further explore theworld with a hierarchical structure of generalized knowledgewhich can help to perform the complex tasks. At the grown-upstage, individuals are more able to efficiently deal with highlycomplex tasks, i.e., better adaptation ability in the knownenvironment for solving complex learning tasks. However,they also lose the power of generalization capabilities, i.e.,it is harder for those individuals to learn new things due tothe improvement of knowledge consistency following variousdevelopmental stages [16].

In another observation regarding the role of individualcapabilities in the social structure development process, in-dividuals contribute to society by resolving multiple complextasks in a way similar to the learning agents in large-scaledistributed learning systems. An individual exists as a unitentity with some basic survival objectives and functions/skillsin the social hierarchy while interacting, contributing, andforming smaller groups such as a family. The conglomerationof families and relatives characterizes a society that behaves asa bigger group to resolve complex life issues and resolve/createconflicts. Subsequently, a union of such groups within a borderdemarcation represents a state which has their own distinctlegal regulations and social structures to solve complex socialissues. The states form a global world organization, such as theUnited Nations, to maintain global harmony and solve overlycomplex global issues. Thus, many small groups unite to forma hierarchical structure for knowledge sharing and solvingcomplex tasks. Here, the group formation process is a commonpurpose of shared benefits among the members or the smallersocial groups. Moreover, the structure analogous to human

Page 3: Distributed and Democratized Learning: Philosophy and

PREPRINT 3

Fig. 2: Anatomy of Democratized Learning.

society results in the collective behavior of an interactivecrowd, often characterized by swarm intelligence, which iswell-observed in numerous biological systems [18]. Over time,such collaboratively-built social structures become stable andmore consistent. Fig. 1 illustrates an analogy between thehierarchical structure in organizations (companies) and a hier-archical distributed learning system. From this figure, we cansee that a global complex task can be accomplished throughthe cooperation of outcomes at each division. Following asimilar analogy, a global learning task can be solved throughthe collaboration of each individual group’s learning outcomes.

These observations provide sufficient hints about the un-derlying duality of the generalized and specialized processesin the entire development process of biological intelligenceor systems. These processes eventually integrate many ba-sic skills to behave/become complex skills; they start fromgeneralized knowledge at a high level of plasticity and headtowards more specialized ones at a high level of stability. Thisalso raises an important question: How can one understandand formalize the duality of the generalized and specializedprocesses regarding the plasticity and stability of a distributedlearning system?

C. Our Contributions

Existing distributed AI systems such as FL focus on buildingthe federation of a central server and clients to construct aglobal model from the aggregation of personalized models,irrespective of the differences in agents’ capabilities and thecharacteristics of their local dataset and learning task. Differentfrom FL, we consider a self-organizing learning frameworkwith a hierarchical structure for solving multiple complexlearning tasks. Furthermore, considering the observations inSection I-B, we develop a novel philosophy to analyze the

following research questions for collective learning using alarge number of biased learning agents, who are committed tolearn from limited datasets and learning capabilities:

• How can large-scale distributed AI systems be self-organized in a suitable hierarchical structure to performknowledge sharing?

• How can learning knowledge be shared among learningagents and tasks?

• How can learning agents integrate the generalized knowl-edge to enhance their learning performance?

Learning in each agent can introduce bias due to the followingscenarios: a) limited number of samples in the local data,b) inadequate availability of features in the local data, c)unbalanced data (e.g., heterogeneity of labeled data in clas-sification problem [2]), or d) limited information (e.g., partialobservation) regarding the environments [19]. In general, thedirect consequence of all of these scenarios is that they pro-duce biased personalized knowledge for each learning agent,which we refer to as the “biased agent”. Thus, a collaborationbetween agents is required to improve the generalized learningperformance of all agents.

In a nutshell, our key contribution is to develop fundamentalprinciples for building large-scale distributed AI systems fromthe self-organizing hierarchical structures that consist of avery large number of biased individuals (learning agents).In our envisioned system, voluntary contributions from thelearning agents enable collaborative learning that empowersthe hierarchical structure to solve multiple complex tasks whilesupporting each agent to improve its learning performance.Furthermore, this collaboration will speed up the learningprocess for the new members without prior learning knowledgeand add benefits in expanding the generalized knowledgeof the groups. Accordingly, we study the underlying dual

Page 4: Distributed and Democratized Learning: Philosophy and

PREPRINT 4

Fig. 3: Conceptual architecture of the proposed democratized learning philosophy.

specialized-generalized processes to develop a “philosophyand design” of a new distributed learning system, namelydemocratized learning or Dem-AI, in short. Furthermore, thedesign philosophy of Dem-AI opens up many new researchchallenges and extensions of the existing learning settings forFL, MTL, MLF, multi-agent reinforcement learning (MARL),and transfer learning. To provide a holistic view, we presentthe anatomy of Dem-AI as shown in Fig. 2. The rest of thepaper is organized as follows.

We introduce the fundamental concepts and principles ofthe proposed democratized learning philosophy in Section II.A reference design of democratized learning is presented inSection III, followed by the possible extension of existingmachine learning settings in Section IV. Section V concludesthe paper.

II. DEMOCRATIZED LEARNING PHILOSOPHY

In this section, we introduce our Dem-AI philosophy in-cluding the definition, the concepts as shown in Fig. 3, andthe principle related to the evolution of the self-organizinghierarchical structure, specialized learning, and generalizationin the new democratized learning system.

A. Definitions, Goal, and Concepts

Definition and goal: Democratized Learning (Dem-AI inshort) focuses on the study of dual (coupled and working to-gether) specialized-generalized processes in a self-organizinghierarchical structure of large-scale distributed learning sys-tems. The specialized and generalized processes must operatejointly towards an ultimate learning goal identified as perform-ing collective learning from biased learning agents, who arecommitted to learn from their own data using their limitedlearning capabilities.

As such, the ultimate learning goal of the Dem-AI system isto establish a mechanism for collectively solving common (sin-gle or multiple) complex learning tasks from a large numberof learning agents. In case of a common single learning tasksetting, the Dem-AI system aims to improve the aggregationaccuracy of all agents, as done in federated learning. The

ultimate goal in a conventional federated learning system isto minimize the average model loss of all agents for a singlelearning task. Moreover, for different learning settings andapplications, the goal of Dem-AI systems can be derived fromfollowing specific designs of learning objectives. The learningagents can also collectively contribute to solving commonmultiple tasks as an ultimate goal in Dem-AI system. Forexample, with multi-tasks learning setting [20], the goal isto attain the overall learning performance for multiple tasks,simultaneously. In meta-learning [9], the learning goal is toconstruct a generalized knowledge that can efficiently dealwith similar new learning tasks. Similarly, in reinforcementlearning, the coordination task is defined to maximize thecumulative rewards for the joint actions taken by individualagents according to their partial observation [19].

Democracy in learning: The Dem-AI system features aunique characterization of participation in the learning pro-cess, and consequently develops the notion of democracy inlearning whose principles include the following:

• According to the differences in their characteristics,learning agents are divided into suitable groups that canbe specialized for the learning tasks. These specializedgroups are self-organized in a hierarchical structure tomediate voluntary contributions from all members in thecollaborative learning for solving multiple complex tasks.

• The shared generalized learning knowledge supports spe-cialized groups and learning agents to improve theirlearning performance by reducing individual biases dur-ing participation. In particular, the learning system allowsnew group members to: a) speed up their learning processwith the existing group knowledge and b) incorporatetheir new learning knowledge in expanding the gener-alization capability of the whole group.

• Learning agents are free to join suitable learning groupsand exhibit equal power in constructing their groups’generalized learning model. The group power can berepresented by the number of its members, which canvary over the training time.

Dem-AI Meta-Law: We define a meta-law as a mechanism

Page 5: Distributed and Democratized Learning: Philosophy and

PREPRINT 5

that can be used to manipulate the transition between the dualspecialized-generalized processes of our Dem-AI system. Thismeta-law is driven by two coincident primary forces: 1) a sta-bility force, and 2) a plasticity force. Throughout the learningtime, the transition mechanism adjusts the importance weightbetween these forces to empower the plasticity or the stabilityin the specialized learning and generalization as well as thehierarchical structure of the Dem-AI system. The Dem-AImeta-law also provides the necessary information to regulatethe self-organizing hierarchical structuring mechanism.

Specialized Process: This process is used to leverage thespecialized learning capabilities in the learning agents andspecialized groups by exploiting their collected data. Thisprocess also drives the hierarchical structure of specializedgroups with many levels of relevant generalized knowledgeto become stable and well-separated. Thus, with the additionof higher levels of generalized knowledge created by thegeneralization of all specialized group members, the learningagents can exploit their local datasets so as to reduce biasesduring personalized learning for a single learning task. Thus,the personalized learning objective has two goals in its learningproblem: 1) to perform specialized learning, and 2) to reusethe available hierarchical generalized knowledge. Besides,generalized knowledge can be incorporated by the regularizersfor the personalized learning objectives. Moreover, the special-ized learning can be performed as group learning formationwhen the members do not have learning capabilities on theirown or when they are required to solve a coordination task.Considerably, over time, the generalized knowledge becomesless important compared to the specialized learning goal anda more stable hierarchical structure of specialized groupswill form. These transitions are the direct consequence ofthe stability force characterized by the specialized knowledgeexploitation and knowledge consistency. However, it becomesstronger over time in the meta-law design of our Dem-AIprinciple.

Generalized Process: This process is used to regulate thegeneralization mechanism for all existing specialized groupsas well as the plasticity level of all groups. Here, groupplasticity pertains to the ease with which learning agents canchange their groups. The generalization mechanism encour-ages group members to become close together when doinga similar learning task and sharing knowledge. This processenables the sharing of knowledge among group members andthe construction of a hierarchical level of the generalizedknowledge from all of the specialized groups. Thereafter, thegeneralized knowledge helps the Dem-AI system to maintainthe generalization ability for efficiently dealing with environ-ment changes or new learning tasks. Hence, knowledge sharingis the mechanism to construct the generalized knowledge fromsimilar and correlated learning tasks such as model averagingin FL [12], sharing knowledge mechanisms in multi-taskslearning [21], and knowledge distillation [22]. Moreover, toresolve the conflict among excessively different specializedgroups, an election mechanism can be adopted to reachconsensus or a union mechanism can be applied to main-tain the diversity of the potential groups. Consequently, thehierarchical generalized knowledge can be constructed based

on the contribution of the group members, which is drivenby the plasticity force. This force is characterized by creativeattributes, knowledge exploration, multi-task capability, andsurvival in uncertainty, and it becomes weaker over time inthe meta-law design of the Dem-AI principle.

Self-organizing Hierarchical Structure: According to thetransition between the two basic forces as well as the necessaryinformation in the meta-law, the hierarchical structure ofspecialized groups and the relevant generalized knowledgeare constructed and regulated following a self-organizationprinciple (e.g., hierarchical clustering [23]). This structurethen evolves to become more stable: temporary small groupswith high-level group plasticity will later unite to form abigger group that enhances the generalized capability for allmembers. Thus, the specialized groups at higher levels in thehierarchical structure have more members and can constructmore generalized (less biased) knowledge. Hierarchical mod-ular networks can be found in the human brain as well as inthe structures of human knowledge [24]. These hierarchicalstructures exhibit higher overall performance and evolvability(i.e., faster adaptation to new environments), as explained in[25].

Next, we establish the general principles that characterizethe evolution of underlying processes in Dem-AI.

B. Dem-AI Principle

Transition in the dual specialized-generalized processduring the training: Throughout the learning time, thespecialized process becomes dominant over the generalizedprocess to perform better in the training environment followingthe Dem-AI meta-law design. This transition induces thefollowing evolution principles of Dem-AI• P1: Evolution of specialized learning and general-

ization: The transition due to the duality of the twoprocesses keeps the Dem-AI system evolving in orderto provide a better adaptation ability for solving complexlearning tasks during training. The Dem-AI system ob-serves an incremental impact of the specialized learningover the learning time and also loses the power ofgeneralization, i.e., a decremental opportunity to dealwith environment changes, such as unseen data, newlearning agents, and new learning tasks.

• P2: Evolution of the self-organizing hierarchical struc-ture: The transition due to the duality of the two pro-cesses keeps the self-organizing hierarchical structureof the Dem-AI system evolving from a high level ofplasticity to a high level of stability, i.e., from unstablespecialized groups to well-organized and well-separatedspecialized groups.

In this transition, the separation of the specialized groups ateach level is accelerated as a consequence of (P1), therebyincreasing the resistance of learning agents to change theirgroups. Meanwhile, the evolution of the self-organizing hier-archical structure (P2) accelerates the evolution of specializedlearning and generalization (P1).

Throughout the training process, predefined goals such asmaximizing the rewards or minimizing learning loss enables

Page 6: Distributed and Democratized Learning: Philosophy and

PREPRINT 6

Fig. 4: The illustration of the transition in Dem-AI principle.

(a) The transition between plasticity and stability. (b) Self-organizing hierarchical structuring.

Fig. 5: The transition between plasticity and stability and the self-organizing hierarchical structuring in the Dem-AI system.

the learning agents to attain higher performance in the fixedtraining environment. Learning agents therefore gain spe-cialized capabilities for the fixed training environment andeventually reduce their generalized capability to adapt tochanges in the applied environments. In this regard, the Dem-AI principle hypothesizes the transitions between (P1) and(P2), which helps the learning agents to obtain better learninggoals in the training process of Dem-AI systems. We thenrealize the mechanism defined as (P1) by controlling theparameters in the specialized learning objective of learningagents and generalization in the next section. The coherencebetween group plasticity is based on the stability of specializedgroups and the learning efficiency of specialized learning.In (P1), the group plasticity accelerates the separation ofthe specialized groups at each level due to the difference intheir learning characteristic (e.g., by updating models towards

different learning directions). Thus, this process increasesthe resistance of learning agents to change their groups. Tospeed up (P2), the Dem-AI system can also directly controlthe resistance of changes in the self-organizing hierarchicalstructuring mechanism.

The transition following the Dem-AI principle in the dualspecialized-generalized process is illustrated in Fig. 4. Inthis transition, the learning agents are grouped according tothe similarities of their learning tasks at the early stage.Then, the generalized process helps in the construction of thehierarchical generalized knowledge for the specialized groupsand encourages the group members to be close together. Inthe meantime, the specialized learning processes leverage thepersonalized and specialized group learning to exploit theirbiased dataset and deviate from the common generalizedknowledge. The hierarchical structure becomes stable with

Page 7: Distributed and Democratized Learning: Philosophy and

PREPRINT 7

the coexistence of well-separated highly complex specializedgroups to provide different highly efficient specialized modelsfor solving the complex learning tasks. This may, however,lead to “overfitting” in the training environment. Therefore, todeal with the environment changes we should properly controlthis dual process to achieve a high specialized performancewhile preserving the generalized capabilities of the Dem-AIlearning system.

III. REFERENCE DESIGN OF DEMOCRATIZED LEARNING

In the previous section, we introduced the fundamentalconcepts and general principles in the Dem-AI philosophyfor democratized learning. In this section, we initiate a ref-erence design with guidelines for the Dem-AI philosophythat is inspired by observations of various interdisciplinarymechanisms in nature. Specifically, a Dem-AI system requiresfour essential mechanisms: transition mechanism of plasticity-stability, self-organizing hierarchical structuring, specializedlearning, and generalization, which will be presented in thefollowing subsections.

A. Plasticity-Stability Transition Mechanism in Dem-AI Meta-Law

The transition of the plasticity and stability of Dem-AIsystems in the meta-law design can drive the evolution ofthe specialized and generalized processes, following suitablemechanisms based on the characteristic of the learning sys-tems. As shown in Fig. 5a, according to the Dem-AI principle,the specialized process with the incremental stability forcebecomes dominant in the generalized process with the decre-mental plasticity force. To implement this transition, we canapproximate the whole learning process by different stages thatchange from a high level of plasticity to a high level of stabilityin specialized learning, generalization, and the self-organizinghierarchical structuring mechanism. Specifically, the meta-lawcan be designed and operated as a global rule at the globalcontroller for the whole system. However, decentralization ofthe learning process requires a design that adds flexibilityin controlling the parameters for the generalization (e.g., γt)and specialization learning mechanisms (e.g., αt, βt) at thegroup level or device level, which are introduced in the nextsubsections. This way, we can avoid fixed global parametersin the meta-law which is applied to all of the learning agentsand groups. Furthermore, these controllable parameters candepend on how long the groups are created and how long theagents participate in the system.

Analogously, in physics, we associate the transition in ourmeta-law with the pendulum principle [26] that shows atransition from the potential to the kinetic energy by the energyconversation law and exhibits a cyclic increment or decrementin sine forms. This analogy additionally reveals a hiddenrelationship between stability and plasticity that can inspireother suitable engineering mechanisms. Thereafter, we canincorporate Hebbian and homeostatic plasticity mechanisms,studied extensively in neuroscience [16], to regulate the Dem-AI systems.

B. Self-Organizing Hierarchical Structuring Mechanism

Fig. 5b shows the self-organizing hierarchical structuringmechanism that helps in constructing and maintaining thestructure of many levels of specialized groups. This structuringprocess can be divided into three stages:

• Early-Stage (Hierarchical structure construction): Thelowest-level groups are created by grouping the agentswho perform a common learning task and have similarcharacteristics in their learning models. A new levelcan be created when the measure of distances amongcurrent groups is greater than a threshold pre-definedfor each level. The structure can reach the maximumnumber of levels which is defined in the Dem-AI meta-law. Alternatively, we can extend existing hierarchicalclustering algorithms [23], clustering mechanisms for FL[15], [27] or game-theoretic mechanisms [28]–[30].

• Adaptation Stage: The adaptation stage allows the learn-ing agents to change their groups. When the level ofgroup plasticity is high, the measured distances amongspecialized groups are short, and, as such, agents canmove among these groups.

• High Specialization Stage: The Dem-AI system allowsmicro-adjustments in low-level groups due to the well-separated and stable specialized groups that are alreadyformed.

New learning agents first join in a suitable specializedgroup in the top-level of the learning system. In time, theseagents will be admitted to lower-level specialized groups withwhom they share similarities in the learning characteristics.Consequently, these agents leverage the existing group knowl-edge to speed up their learning process. Furthermore, withthe availability of new data, the agents can contribute theirvaluable personalized knowledge to improve the generalizationcapability of the groups. Note that the metric of measureddistance in the structuring mechanism can be derived fromthe differences in the characteristics of learning agents orspecialized groups (e.g., in FL, the metric can be a Euclideandistance of model parameters, gradients, or momentum oflearning agents or groups). Therefore, the policy for the systemto group different agents and change groups can be defined bythe threshold metrics, measured as the differences between thelearning agents and between the groups. The recent work in[31] provides a promising approach to analyze the similarity oflayers in the neural network representation based on centeredkernel alignment (CKA).

In swarm intelligence system, e.g., swarm robotics, the self-organized behaviors of a large number of robots can coordi-nate with each other to design robust, scalable, and flexiblecollective behaviors [18], which can be instrumental for ourmechanism design. Similarly, in addition to the developmentprocess of a social structure, the other suitable mechanismssuch as the growth process in a biological cell can be incorpo-rated. In biology, the solid complex composited structures suchas DNA can be separated after the initial coincidence periodin the cell division process. For well-separated groups that areformed, the agents who become excessively different throughpersonalized learning (e.g., different gradient and personalized

Page 8: Distributed and Democratized Learning: Philosophy and

PREPRINT 8

(a) Specialized learning mechanism. (b) Hierarchical generalization mechanism.

Fig. 6: Specialized learning and generalization mechanism in the Dem-AI system.

model parameters) or those who function poorly can beeliminated from their groups. Analogously, such mechanismscan be found in immune responses that destroy unhealthy cells(e.g., cancer or virus-infected cells) [32]. However, we can alsoconsider that these agents behave as new learning agents andmove towards other suitable groups.

C. Specialized Learning Mechanism

Specialized learning facilitates the personalized and spe-cialized group learning capability using existing hierarchicalgeneralized knowledge that is represented in Fig. 6a. For thismechanism, we discuss the general design of personalizedlearning and specialized group learning, as well as relatedproblems.

Personalized learning problem of learning agents: InDem-AI, a personalized learning problem can be constructedfor each learning agent with a personalized learning objective(PLO) that comprises: 1) a personalized learning goal (PLG),and 2) a reusable generalized knowledge (GK). For example,PLG is the learning loss function, and GK is the regularizerdefined as the difference between the new model parametersand the model parameters of the higher-level specializedgroups (e.g., FEDL [5], FedProx [13]), i.e.,

PLO = αtPLG + βtGK

= αtPLG + βt∑k

1

N(k)g

GK(N (k)g ), (1)

where N(k)g and GK(N

(k)g ) are the number of agents and

the generalized knowledge in the specialized group level k,respectively. The higher levels of generalized knowledge areless important when solving any specific learning task of theagents than the lower-level specialized knowledge. Since thespecialized process is more important when improving thespecialized capability of the personalized learning model, theweight parameter βt must be decreased in order to reduce theplasticity while αt must be increased according to the meta-law design. If the learning agents cannot directly incorporatethe generalized knowledge (e.g., they do not have the samemodel parameters), a special integration mechanism for the

hierarchical structure of knowledge is required. Moreover,computing, communication resource and delay constraints alsoneed to be considered in the learning problem. An exampleof the specialized learning problem using the proximal termto constrain the local learning model, such that it be closerto the learning model at the higher-level groups, is defined asfollows:

minw

αtL(0)n (w|D(0)

n ) + βt∑k

1

N(k)g

‖w −w(k)n ‖2; (2)

where L(0)k is the learning loss function of the learning agent

n for a classification task given its personalized dataset D(0)k

and the learning model w(k)n of the higher-level groups [33].

Also, other knowledge transfer techniques such as multi-taskregularizer [21] or knowledge distillation regularizer [22] canbe used to define GK in the PLO of learning agents.

Specialized learning problem of specialized groups: Incase of group members who do not have adequate learningcapabilities (e.g., IoT sensing devices with low computationalcapabilities), or learning agents who are required to solve acoordination task. For example, in practical IoT applications, agiven learning system may not always be able to guarantee theparticipation of all clients in every communication round dueto intermittent communication, battery drainage, or hardwareailments [34]. Therefore, a specialized group learning problemcan be performed at the edge servers, and/or fog nodes atthe network’s edge as done, for example, in the In-EdgeAI framework [35]. The goal of this problem is to fit thecollective datasets of all group members, where the groupbehaves as a virtual agent that solves the learning problem.Furthermore, the specialized group learning can also havespecial decentralized learning structures (e.g., sharing of criticnetwork in multi-agent deep reinforcement learning (DRL)[36], meta-training phase in MLF [9]). Similar to the person-alized learning problem, specialized group learning needs tobe extended by leveraging the generalized knowledge from thehigher-level specialized groups. Next, in order to achieve jointenergy-learning efficiency given the limited communicationand computation resources, the design of group learningproblem needs to consider a synergy of resource allocation,

Page 9: Distributed and Democratized Learning: Philosophy and

PREPRINT 9

device scheduling, and learning performance. In doing so, thelearning system can accommodate large group members andensure more frequent model updates, thereby improving thegroup learning performance.

D. Generalization MechanismThe generalization mechanism aims to collectively construct

the hierarchical generalized knowledge from all existing spe-cialized groups or learning agents, as illustrated in Fig. 6b.Accordingly, the generalized knowledge extends the general-ization capability of the Dem-AI system to learn new tasks ordeal with environment changes more efficiently. For this pur-pose, we propose four strategies that are suitable for differentlevels: direct knowledge sharing, indirect knowledge sharing,election, and union, which can be fixed or can be realized froma categorical distribution of these strategies. At the lowest-levelspecialized groups, the direct knowledge sharing of learningagents is possible due to the similarity in the learning taskto be performed. At the higher-level specialized groups, theindirect knowledge sharing among subgroups (i.e., transferredknowledge, meta-knowledge) becomes more probable due tothe huge differences among specialized groups and the char-acteristics of learning tasks. Throughout the learning process,the groups become more and more specialized to efficientlysolve different complex learning tasks. Consequently, thegeneralized knowledge of the specialized groups becomes verydifferent at a higher level. Thus, an election mechanism basedon voting can help in reaching consensus among specializedgroups. To this end, a union mechanism is designed as anensemble of the collection of highly-specialized groups. This isa possible way to maintain the diversity of potential groups forthe entire learning system. Basically, the diversity of potentialgroups plays a vital role in the learning system. It allows thepreservation of ineffective specialized groups who have fewermembers or those who show low performance in the trainingsetting, but are potentially able to deal with the changes in thetraining environment or new tasks. Therefore, the union andelection mechanisms of Dem-AI are related to the diversitymaintenance of the biological species through natural selectionand non-competitive processes (i.e., symbiosis) in the evolu-tion process or the robustness of decentralized systems [37].In addition to the measurement of efficiency, the robustnessor diversity of the Dem-AI system can be measured andcontrolled throughout the training time following a validationprocedure.

Hierarchical learning model parameters averaging forknowledge sharing: In the case of a shared generalized modelamong all group members, the direct knowledge sharing canbe designed with the hierarchical averaging of the generalizedmodel parameters (GMP) at each level k as follows:

GMP(N (k)g ) = (1− γt)GMP(N (k)

g )

+ γt∑i∈S

N(k−1)g,i

N(k)g

GMP(N (k−1)g,i ), (3)

where S is the set of subgroups of a specialized group,N

(k−1)g,i is the number of agents in subgroup i, and N

(k)g is

the total number of agents in the current specialized groupat level k. The model averaging implementation is a typicalaggregation mechanism adopted in several FL algorithms[12], [13]. Parameter γt controls the update frequency of thegeneralized knowledge whose value decreases over time as themembers become well-specialized in their learning knowledge.Accordingly, the model parameters of the subgroups whichhave more numbers of agents become more important in thegeneralized model.

Knowledge distillation and knowledge transfer: For mul-tiple complex tasks, the Dem-AI framework allows knowledgetransfer across tasks in different domains by leveraging collab-oration amongst learning agents in the hierarchical structure.In this regard, multi-task learning enables generalization bysolving multiple relevant tasks simultaneously [20], [21], [38].The work in [38] studied the relationship between jointlytrained tasks and proposed a framework for task grouping inMTL setting. Accordingly, the authors have analyzed learningtask compatibility in computer vision systems by evaluatingtask cooperation and competition. For example, a sharedencoder and representation is learned through training highly-correlated tasks together such as in Semantic Segmentation,Depth Estimation, and Surface Normal Prediction. However,this framework is limited to analyzing multiple learning tasksfor a single agent. Whereas, in Dem-AI systems, a groupof agents can train similar tasks in the low-level groups.Meanwhile, highly related tasks can be jointly trained togetherin the high-level groups.

Furthermore, the latent representations across different de-vices or groups are supported by adopting existing techniquesof knowledge distillation, transfer learning, meta-knowledgeconstruction, and specialized knowledge transfer. Knowledgedistillation [22] and knowledge transfer among multiple tasks[10], [11] are important techniques to extend the capabilities ofknowledge sharing. For example, in [22], knowledge distilla-tion mechanisms such as exchanging model parameters, modeloutputs, and surrogate data are incorporated in distributedmachine learning frameworks. Meanwhile, knowledge transferhas been recently studied in the federated MTL setting usingdifferent types of MTL regularization such as cluster structure,probabilistic priors, and graphical models [10]. Moreover, thework [11] forms a Bayesian network and uses variational infer-ence methods with the lateral connections design between theserver and client models to transfer knowledge among tasks.Different from the recent works, the conventional organiza-tional knowledge creation theory [39] introduced a promisingparadigm in which the new knowledge of an organizationis articulated from the knowledge of individuals and self-organized in a hierarchical structure. Thus, the shared knowl-edge can be in an abstract form or an explicit combination ofthe individual’s knowledge through the conceptualization andcrystallization process. In doing so, together with the hierar-chical learning model parameters averaging, we can developsuitable knowledge sharing approaches for the generalizationmechanism in our Dem-AI systems.

Page 10: Distributed and Democratized Learning: Philosophy and

PREPRINT 10

Fig. 7: An example of Dem-AI systems: Multi-language handwriting recognition.

E. Example of Dem-AI Systems

More recently, the use of personalized applications, suchas virtual assistants that could adhere to users’ personality,has gained significant attraction. The goal of such intelligentsystems is to learn the unique features and personalizedcharacteristics during daily activities and make appropriatedecisions for each user, then enhance user interest. However,the main problem is the extraction of personalized features toperform knowledge transfer with limited local data. The Dem-AI system allows end-users and service providers to take partin a win-win solution, that is, the service providers exploituser’s knowledge to scale up their services, and the end-users can collectively improve their personalized performancethrough knowledge sharing in a suitable group. For example,Google has provided a personalized virtual assistant (i.e.,Google Now [40]) which can respond to user’s questions withmore relevant answers. Such reactive response systems can beextended to provide intelligent personalized recommendationservices in a proactive manner. In this application, the hierar-chical recommendation models can be constructed followingDem-AI mechanisms by leveraging the shared features fromdifferent domains, and users/groups at different levels.

In addition, we present a novel multi-language handwritingrecognition system based on our Dem-AI reference design, asshown in Fig. 7. A typical handwriting language recognitionapplication has an embedded virtual assistant to improve thecapability for understanding human written texts in variouslanguages. However, to realize such systems, we need sep-arate recognition models for each language (e.g., Englishand Korean). Using our Dem-AI reference design, agentsundergo self-organization to form appropriate hierarchicalregional/social groups so as to share the similarity in the char-acteristics of their languages. By exploiting such structures, thelearning system can collectively incorporate the personalizedexperiences of users that improve the generalization learning

model. Subsequently, it empowers the recognition capability ineach agent along with increasing the importance of the special-ized process in the system. This kind of application can scaleup to a large number of agents and support multiple languages.Thus, it has a potential to integrate different voice recognitionmodels to develop a fully supporting virtual assistant foreach client. Therefore, we unleash limitless possibilities foremploying the Dem-AI philosophy in future distributed AIapplications and, in the next section, we discuss new researchchallenges.

IV. EXTENSION OF EXISTING MACHINE LEARNINGSETTINGS AND CHALLENGES

The learning objective in the democratized learning settingfor a large-scale distributed learning system cannot be readilysolved by existing machine learning techniques. Further, thelimited design considerations and frameworks for both gener-alized and specialized capabilities of the distributed learningmodels necessitate a radical change in our approach to createefficient and better scalable learning systems. In the previoussections, as a first step to address these challenges, we estab-lished the Dem-AI philosophy and provided reference mecha-nisms from the interdisciplinary fields in nature. Accordingly,in the following subsections, we come up with new researchchallenges to develop future large-scale distributed learningsystems that can leverage the Dem-AI philosophy, principle,and reference mechanisms.

A. Federated Learning towards Democratized Learning

Naturally, FL setting for a single learning task can beextended to multiple complex learning tasks with a very largenumber of biased learning agents in a democratized learningsystem. In addition, the learning agent’s biases due to thecharacteristics of limited personalized data is a more generalsetting than the current non-i.i.d use cases of FL. Using the

Page 11: Distributed and Democratized Learning: Philosophy and

PREPRINT 11

personalized learning problem (2), the personalized model canincorporate the model of higher-level generalized groups byusing proximal terms [33]. Also, the hierarchical averagingof learning model parameters in (3) could help agents toshare their learning knowledge and construct more generalizedknowledge of groups. Thereafter, the self-organizing hierar-chical structuring mechanism in Dem-AI can better adhereto the difference and similarity of the learning of the agents,which can be a promising direction to solve the problem ofpersonalization and generalization more efficiently. Moreover,Dem-AI also provides a better mechanism to handle newlyarriving learning agents or deal with the changes in the agents(e.g., a change in their local datasets) due to the propertiesof self-organizing hierarchical structure and underlying dualprocesses. By moving new agents to suitable specializedgroups, Dem-AI leverages new personalized knowledge forthat group, where new members can also reuse the currentspecialized group knowledge.

In the current practical setting of the FL system, onlya subset of available agents are chosen during the trainingprocess. This procedure leads to a very high-level groupplasticity. Therefore, it is challenging to build a stable systemhaving many levels of the hierarchical structure. In such a case,the number of levels in the hierarchical structure can be limitedto a small number. The two essential corresponding researchquestions that can potentially revolutionize the FL towardsDem-AI systems are: 1) How can we design a suitable self-organizing hierarchical structuring mechanism, and 2) Howcan we better leverage the generalized knowledge by usinga new hierarchical averaging mechanism or other relevantsharing knowledge approaches. In addition, we can extendthe Dem-AI philosophy to other distributed learning systemsthat are analogous to FL, such as the brainstorming generativeadversarial network (GAN) system proposed in [41] whichapplies FL-like principles to generative models rather thaninference models.

B. Cooperative Multi-Agent Reinforcement Learning

The setting of cooperative MARL discussed in [19] requiresa coordination of decentralized policies for solving complextasks due to the partial observability on of each agent n (e.g.,different fields of view). In fact, shared common knowledgeand hierarchical policy design arise naturally in the decen-tralized cooperative tasks, as discussed in [19]. A cooperativereward r(s,ajoint) is built according to the function of the jointaction (i.e., ajoint =: a1, . . . , an). Several other approachessuch as centralized critic, decentralized execution in MAD-DPG [36], hierarchical critic in [42], and feudal framework[43] introduce various designs of decentralized operation in co-operative reinforcement learning. However, these frameworksmainly analyze two levels of the hierarchical structure with asmall number of agents that could be suitable only for a grouplearning. As a result, scaling up the design with a large numberof learning agents who need to perform multiple coordinationtasks would be very challenging using the existing designs.Existing federated reinforcement learning designs [44], [45]can be readily incorporated with our mechanisms, as discussed

in prior sections. Nevertheless, in order to fully realize Dem-AI principles and mechanisms, we must overcome two keychallenges: a) how to develop novel similarity metrics forgroup formation (e.g., observations/tasks/goals based metrics),and b) how to realize suitable multi-level cooperation forknowledge acquisition among groups of agents.

Furthermore, current MARL systems are ineptly designedfor handling environmental changes, such as the deploymentof new agents or the deployment in different environments.We believe that the democratized learning philosophy andthe presented reference mechanisms can provide more flexibleapproaches to control exploration and exploitation capabilitieswith a self-organizing hierarchical structure of the agents. Thiscan help MARL systems as those in [19] and [36] evolvetowards a better scalable and powerful design. Dem-AI alsoprovides an opportunity to collectively train each agent formultiple basic DRL tasks, and then the collective knowledgecan help to solve more difficult tasks with specialized groupsof agents. Subsequently, such decentralized autonomous sys-tems can be widely applied to handle multiple complex tasksin future applications. In our recent work [46], we showed asimple Dem-AI principle for DRL whereby we allowed a DRLagent to gain “experience” on extreme events (which can beseen as a specialized process) by training over a GAN-basedsystem. Using this early work, one can build a more elaborateMA-DRL system under the Dem-AI umbrella.

C. Multi-Task Learning and Meta-Learning

The current setting of MTL and meta-learning is restrictedto train similar tasks or strongly correlated learning tasks, andit focuses on maintaining a certain level of generalization andperformance for each learning task. Recent federated MTLframeworks such as those in [10] and [11] resolve the statis-tical challenges of different user-dependent data distributionsin the classical FL with different but similar learning tasksfor each client. A general formulation in [10] introduced atrainable correlation matrix Ω between tasks as follows:

minw,Ω

∑n

Ln(w|D(0)n ) +R(w,Ω), (4)

where a variety of regularizer function R can be imple-mented for the clustered multi-task learning problem [21],[47]. Thereafter, we can incorporate the proximal terms in(2) and hierarchical averaging in (3) into this federated MTLframework and build a hierarchical generalization by groupingsimilar or strongly correlated tasks.

Similarly, to turn the existing federated MLF frameworksin [14], [48] into practical large-scale distributed learningsystems, Dem-AI mechanisms are also necessary to efficientlydeal with new tasks which can join a suitable correlated groupof tasks, eliminating the need for them to be very similar to allof the training tasks. Therefore, grouping a large number oftasks in a self-organizing hierarchical structure with differentlevels of knowledge transfer learning is one of the promisingdesigns to extend MLF and MTL frameworks towards a large-scale design.

Page 12: Distributed and Democratized Learning: Philosophy and

PREPRINT 12

D. Transfer Learning

Transfer learning is an important technique that can help inthe sharing of knowledge among specialized group membersand among multiple tasks in democratized learning. The gen-eralized knowledge can be transferable, directly or indirectly,due to the similar characteristic such as in-group sharingbased on recent works: federated distillation [22] or novelGAN designs for distributed datasets [41], [49]. However,the high level of specialized knowledge is often difficult totransfer due to high dissimilarity and incompatibility of theknowledge. Therefore, we need a novel approach to extractspecialized knowledge from different learning tasks. Moreover,a consensus of incompatible knowledge can be reached amongthe members or specialized groups by the novel electionand union mechanism. Therefore, the hierarchical structure intransfer learning is a promising research direction that willenable the democratized learning system to solve multiplecomplex learning tasks more efficiently.

V. CONCLUSION AND FUTURE WORKS

Existing machine learning designs face critical challengesin scaling up the current centralized AI systems into thedistributed AI systems that can perform multiple complexlearning tasks. In this paper, we have established the principlesof a novel democratized learning setting, dubbed as Dem-AI, while reviewing and incorporating the natural designconsiderations for the distributed machine learning systems.As an initial step towards this, we have first established anatural design approach using the Dem-AI philosophy andits reference mechanisms from various interdisciplinary fieldsdesigned for the large-scale distributed learning systems. Inparticular, we have presented the evolution of the specializedand generalized processes and the formation of self-organizinghierarchical structure in the Dem-AI principle. Next, withDem-AI, we have introduced possible extensions on machinelearning settings and new challenges for the existing learningapproaches to provide better scalable and flexible learningsystems. The effects of transitions in Dem-AI principle shouldbe further validated for different specific learning settings andapplications. We leave the validation analysis of the proposedDem-AI principle for our future works.

ACKNOWLEDGMENT

This work was supported by the National Research Foun-dation of Korea (NRF) grant funded by the Korea government(MSIT) (No.2020R1A4A1018607).

REFERENCES

[1] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial neuralnetworks-based machine learning for wireless networks: A tutorial,”IEEE Communications Surveys Tutorials, vol. 21, no. 4, pp. 3039–3071,Fourthquarter 2019.

[2] P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhag-Canhoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al.,“Advances and open problems in federated learning,” arXiv:1912.04977,2019.

[3] N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, and C. S.Hong, “Federated learning over wireless networks: Optimization modeldesign and analysis,” in IEEE Conference on Computer Communications(INFOCOM), Paris, France, April 29– May 2 , 2019, pp. 1387–1395.

[4] S. R. Pandey, N. H. Tran, M. Bennis, Y. K. Tun, A. Manzoor, and C. S.Hong, “A crowdsourcing framework for on-device federated learning,”IEEE Transactions on Wireless Communications, 2020.

[5] C. Dinh, N. H. Tran, M. N. H. Nguyen, C. S. Hong, W. Bao, A. Y.Zomaya, and V. Gramoli, “Federated Learning over Wireless Networks:Convergence Analysis and Resource Allocation,” arXiv:1910.13067,2019.

[6] L. U. Khan, N. H. Tran, S. R. Pandey, W. Saad, Z. Han, M. N. H.Nguyen, and C. S. Hong, “Federated learning for edge networks:Resource optimization and incentive mechanism,” arXiv:1911.05642,2019.

[7] M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, “A jointlearning and communications framework for federated learning overwireless networks,” arXiv:1909.07972, 2019.

[8] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence time opti-mization for federated learning over wireless networks,” arXiv preprintarXiv:2001.07845, 2020.

[9] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning forfast adaptation of deep networks,” in Proceedings of the 34th Interna-tional Conference on Machine Learning, Sydney, NSW, Australia, Aug.2017, pp. 1126—-1135.

[10] V. Smith, C.-K. Chiang, M. Sanjabi, and A. S. Talwalkar, “Federatedmulti-task learning,” in Advances in Neural Information ProcessingSystems 30, Dec. 2017, pp. 4424–4434.

[11] L. Corinzia and J. M. Buhmann, “Variational federated multi-tasklearning,” arXiv:1906.06268, 2019.

[12] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,“Communication-efficient learning of deep networks from decentralizeddata,” in Artificial Intelligence and Statistics, 2017, pp. 1273–1282.

[13] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith,“Federated optimization in heterogeneous networks,” in Proceedings ofMachine Learning and Systems 2020, 2020, pp. 429–450.

[14] A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federatedlearning: A meta-learning approach,” arXiv:2002.07948, 2020.

[15] Y. Mansour, M. Mohri, J. Ro, and A. T. Suresh, “Three approaches forpersonalization with applications to federated learning,” arXiv preprintarXiv:2002.10619, 2020.

[16] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continuallifelong learning with neural networks: A review,” Neural Networks, vol.113, pp. 54–71, 2019.

[17] P.-Y. Oudeyer, “Computational theories of curiosity-driven learning,”arXiv:1802.10546, 2018.

[18] M. Brambilla, E. Ferrante, M. Birattari, and M. Dorigo, “Swarmrobotics: a review from the swarm engineering perspective,” SwarmIntelligence, vol. 7, no. 1, pp. 1–41, 2013.

[19] C. Schroeder de Witt, J. Foerster, G. Farquhar, P. Torr, W. Boehmer, andS. Whiteson, “Multi-agent common knowledge reinforcement learning,”in Advances in Neural Information Processing Systems 32. CurranAssociates, Inc., 2019, pp. 9927–9939.

[20] S. Ruder, “An overview of multi-task learning in deep neural networks.”[21] Y. Zhang and Q. Yang, “A survey on multi-task learning,”

arXiv:1707.08114, 2017.[22] J. Park, S. Wang, A. Elgabli, S. Oh, E. Jeong, H. Cha, H. Kim, S.-L.

Kim, and M. Bennis, “Distilling on-device intelligence at the networkedge,” arXiv:1908.05895, 2019.

[23] G. Karypis, E. H. Han, and V. Kumar, “Chameleon: Hierarchicalclustering using dynamic modeling,” Computer, vol. 32, no. 8, pp. 68–75, Aug. 1999.

[24] P. Zurn and D. S. Bassett, “Network architectures supporting learnabil-ity,” Philosophical Transactions of the Royal Society B, vol. 375, no.1796, p. 20190323, 2020.

[25] H. Mengistu, J. Huizinga, J.-B. Mouret, and J. Clune, “The evolutionaryorigins of hierarchy,” PLoS computational biology, vol. 12, no. 6, p.e1004829, 2016.

[26] S. J. Ling, J. Sanny, W. Moebs et al., University Physics Volume 1.OpenStax, 2016.

[27] F. Sattler, K.-R. Muller, and W. Samek, “Clustered federated learning:Model-agnostic distributed multitask optimization under privacy con-straints,” IEEE Transactions on Neural Networks and Learning Systems,pp. 1–13, 2020.

[28] Z. Han, D. Niyato, W. Saad, and T. Basar, Game Theory for NextGeneration Wireless and Communication Networks: Modeling, Analysis,and Design. Cambridge University Press, 2019.

[29] L. Rose, E. V. Belmega, W. Saad, and M. Debbah, “Pricing in hetero-geneous wireless networks: Hierarchical games and dynamics,” IEEETransactions on Wireless Communications, vol. 13, no. 9, pp. 4985–5001, Sep. 2014.

Page 13: Distributed and Democratized Learning: Philosophy and

PREPRINT 13

[30] W. Saad, Z. Han, M. Debbah, A. Hjørungnes, and T. Basar, “Coalitionalgame theory for communication networks,” IEEE Signal ProcessingMagazine, vol. 26, no. 5, pp. 77–97, Sep. 2009.

[31] S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neuralnetwork representations revisited,” arXiv:1905.00414, 2019.

[32] A. Iannello and D. H. Raulet, “Immune surveillance of unhealthy cellsby natural killer cells,” in Cold Spring Harbor symposia on quantitativebiology, vol. 78, 2013, pp. 249–257.

[33] M. N. H. Nguyen, S. R. Pandey, T. Nguyen D., E. N. Huh, C. S.Hong, N. H. Tran, and W. Saad, “Self-organizing democratized learning:Towards large-scale distributed learning systems,” arXiv:2007.03278,2020.

[34] F. Sattler, S. Wiedemann, K.-R. Muller, and W. Samek, “Robust andcommunication-efficient federated learning from non-iid data,” IEEEtransactions on neural networks and learning systems, 2019.

[35] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-edgeai: Intelligentizing mobile edge computing, caching and communicationby federated learning,” IEEE Network, vol. 33, no. 5, pp. 156–165, 2019.

[36] R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” inAdvances in Neural Information Processing Systems 30, Dec. 2017, pp.6379–6390.

[37] N. L. Johnson, “Diversity in decentralized systems: Enabling self-organizing solutions,” in Decentralization II Conference, UCLA, 1999.

[38] T. Standley, A. R. Zamir, D. Chen, L. Guibas, J. Malik, and S. Savarese,“Which tasks should be learned together in multi-task learning?”arXiv:1905.07553, 2019.

[39] I. Nonaka, “A dynamic theory of organizational knowledge creation,”Organization science, vol. 5, no. 1, pp. 14–37, 1994.

[40] “Google now,” https://www.wordstream.com/google-now.[41] A. Ferdowsi and W. Saad, “Brainstorming generative adversarial net-

works (BGANs): Towards multi-agent generative models with dis-tributed private datasets,” arXiv:2002.00306, 2020.

[42] Z. Cao and C.-T. Lin, “Reinforcement learning from hierarchical critics,”arXiv:1902.03079, 2019.

[43] S. Ahilan and P. Dayan, “Feudal multi-agent hierarchies for cooperativereinforcement learning,” arXiv:1901.08492, 2019.

[44] C. Nadiger, A. Kumar, and S. Abdelhak, “Federated reinforcement learn-ing for fast personalization,” in 2019 IEEE Second International Con-ference on Artificial Intelligence and Knowledge Engineering (AIKE).IEEE, 2019, pp. 123–127.

[45] X. Wang, C. Wang, X. Li, V. C. Leung, and T. Taleb, “Federateddeep reinforcement learning for internet of things with decentralizedcooperative edge caching,” IEEE Internet of Things Journal, 2020.

[46] A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experi-enced deep reinforcement learning with generative adversarial networks(GANs) for model-free ultra reliable low latency communication,”arXiv:1911.03264, 2019.

[47] J. Zhou, J. Chen, and J. Ye, “Clustered multi-task learning via alternatingstructure optimization,” in Advances in neural information processingsystems, 2011, pp. 702–710.

[48] F. Chen, M. Luo, Z. Dong, Z. Li, and X. He, “Federated meta-learningwith fast convergence and efficient communication,” arXiv:1802.07876,2018.

[49] C. Hardy, E. Le Merrer, and B. Sericola, “Md-gan: Multi-discriminatorgenerative adversarial networks for distributed datasets,” in 2019 IEEEInternational Parallel and Distributed Processing Symposium (IPDPS),Brazil, May 2019, pp. 866–877.