An Inter-Cloud Meta-Scheduling (ICMS) Simulation Framework: Architecture and Evaluation

Inter-cloud is an approach that facilitates scalable resource provisioning across multiple cloud infrastructures. In this paper, we focus on the performance optimization of Infrastructure as a Service (IaaS) using the meta-scheduling paradigm to achieve an improved job scheduling across multiple clouds. We propose a novel inter-cloud job scheduling framework and implement policies to optimize performance of participating clouds. The framework, named as Inter-Cloud Meta-Scheduling (ICMS), is based on a novel message exchange mechanism to allow optimization of job scheduling metrics. The resulting system offers improved flexibility, robustness and decentralization. We implemented a toolkit named “Simulating the Inter-Cloud” (SimIC) to perform the design and implementation of different inter-cloud entities and policies in the ICMS framework. An experimental analysis is produced for job executions in inter-cloud and a performance is presented for a number of parameters such as job execution, makespan, and turnaround times. The results highlight that the overall performance of individual clouds for selected parameters and configuration is improved when these are brought together under the proposed ICMS framework.


INTRODUCTION
HE concept behind cloud computing is to provide an on demand scalable and agile infrastructure.Its biggest advantage is the service elasticity that offers scaling of the cloud resources based on user demand [7].In this work, we focus on inter-cloud that is an infrastructure that exploits communication across multiple clouds to support diverse and large number of user requests.Inter-cloud aims to increase cloud service elasticity and scalability while minimizing the operational costs.The inter-clouds allow the formation of a collaborative partnership for service exchange under a mutually agreed management while ensuring a certain level of Quality of Service (QoS).In particular, an inter-cloud facilitates communication by acting as a gateway and broker between different cloud providers.In this work we propose an inter-cloud framework that optimises the performance of an infrastructure that may comprise of multiple clouds.
In order to realize an inter-cloud, meta-scheduling may play an important role in the way resources are managed and requests are processed.Specifically, a meta-scheduler could select available resources from multiple clouds taking into account appropriate Service Level Agreements (SLAs), operating conditions (e.g.cost, availability) and performance criteria [4].This requires resources from multiple clouds to be orchestrated in such a way that tasks are efficiently executed.Our goal is to gain advantage of already developed solutions for large-scale meta-scheduling approaches and implement an Inter-Cloud Meta-Scheduling (ICMS) framework that can improve performance metrics including task execution times, latencies and makespan times by exploiting resources from multiple clouds.
The work is motivated from the future of Internet computing as described in [6].Specifically, the authors note that today there are different cloud providers that address different needs and may offer different functionality, yet they share the same characteristics in terms of how resources are being provisioned and consumed.These clouds share similarities in structure and architecture.Inter-cloud models should allow tasks to be exchanged in order to achieve better QoS levels by exploiting the resources from a number of cloud providers by employing novel meta-scheduling approaches.In this work we address the limitation in the current cloud implementations that they do not offer support for task federation.
In contrast to other efforts, as described in [2] and [4], we propose a more inclusive design that provides task federation through a decentralized meta-scheduling solution.Each cloud infrastructure may have their own local scheduler which may not have information about resources in other clouds.This work, extends the initial effort in [15] by presenting the complete architecture along with new algorithms and the messaging model of ICMS.Further, the experimental study demonstrates an extended use of performance metrics based on new algorithms and use cases for evaluation purposes.Also, the new al-  ---------------- S.S. Author is with the Technical University of Crete (TUC), Chania, Crete, Greece.E-mail: s.sotiriadis@intelligence.tuc.gr N.B.Author is with the University of Derby, Kedleston Road, Derby, DE22 1GB.E-mail: n.bessis@derby.ac.uk  A.A. Author is with the University of Derby, Kedleston Road, Derby, DE22 1GB.E-mail: a.anjum@derby.ac.uk  R. B Author is with the University of Melbourne, Australia.E-mail: rbuyya@unimelb.edu.au Manuscript received (22 July 2014).Please note that all acknowledgments should be placed at the end of the paper, before the bibliography.
2 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, MANUSCRIPT ID gorithms and performance evaluation experiments have been produced in the newest version of SimIC that realizes inter-cloud algorithms along with the messaging model.Its architecture is based on CloudSim yet it implements and extends its features from the perspective of processing batch jobs in meta-scheduling systems.
Having said that, the paper is organized as follows, section 2 presents a discussion of related works.Section 3 details the proposed ICMS framework and section 4 presents the SimIC simulation toolkit and the experimental configuration.Section 5 details the performance results and evaluation.The work concludes in section 6 with a summary and a discussion of the future directions.

RELATED WORKS
The inter-cloud has been characterized as a large-scale resource management system comprising of multiple autonomous clouds [5].These independently managed clouds may be homogenous or heterogeneous, yet in an inter-cloud infrastructure they will need to function under a single federated management entity.This section focus on a literature review of the meta-scheduling approaches developed for large-scale systems that may exhibit similar characteristics to inter-clouds.In detail, we focus on the algorithms with regards to inter-clouds.
The work of [8] presents a decentralized dynamic algorithm named Estimated Load Information Scheduling Algorithm (ELISA).The algorithm estimates the queue length of neighboring processors and then reschedules the loads based on estimates.The method aims to increase the possibilities to gain load balancing by estimation based on updated information after large time intervals.Yet, the method is not adaptable to inter-cloud as the algorithm requires lengths of queues of neighboring hosts; consequently it exposes internal information.In [9] authors demonstrate a distributed computing scheduling model.The key idea of the proposed meta-scheduler is to redundantly distribute each service to multiple sites, instead of sending the service to the most lightly loaded.We envision that inter-clouds will mainly be used for highly loaded scenarios; therefore this method will decrease the overall performance.
The work of [10] presents a model for connecting various Condor work pools that yield to a self-organizing flock of Condors.The model uses the Condor resource manager to schedule services to idle resources.This method, similar to [8], includes comparison of queues, so makes local information to be exposed and it is considered not adoptable to inter-clouds.The authors conclude that it performs better for lightly loaded sites and thus as in [9] this will also decrease the overall performance.Authors in [11] present a scheduling infrastructure called OurGrid which is based on the Bag-Of-Tasks applications.OurGrid is a collection of peers constituting a community.This is a decentralized solution based on site reputation and debts.As debts grow services could become less prioritized, thus could lead to starvation, which in turn could affect inter-cloud performance.In [12] authors discuss a market-based resource allocation system.The scheduling mechanism in this system is based on auctions.Specifically, each resource provider or owner runs an auction for their resources.However, this does not guarantee an optimized inter-cloud solution as resources can be under-utilized due to meta-schedulers that might bid always for a specific set of resources.
In [29], authors describe two scheduling algorithms, namely Modified ELISA (MELISA) based on [8] and load balancing on arrival.Both algorithms are based on the distributed scheme of sender-initiated load balancing.To improve MELISA performance, the authors conclude that the load balancing on arrival method will balance the high service arrival rates.However, this solution includes exchanging of local queues as discussed in [8], thus it is inefficient with regards to inter-clouds.The delegated matchmaking (DMM) approach presented by [13] is a novel delegated technique, which allows the interconnection of several grids without requiring the operation under a central control point.Their simulation results show that DMM can have significant performance and administrative advantages.However, this work raises heterogeneity issues in large-scale distributed settings.In [17] authors present a model for an InterGrid system that enables distributed resource provisioning from local to global scale.In [18], authors evaluate the performance analysis of the InterGrid architecture by using various algorithms e.g.conservative backfilling.The results show that the average response time has improved in the aforementioned evaluated scheduling algorithms.Yet, [19] suggest that the approach reflect an economical view as business application is the primary goal.
In [19], authors present a decentralized model for addressing scheduling issues in federated grids.This solution proposes the utilization of GridWay, as a metascheduler to each grid infrastructure.The authors assume a complete setting in terms of meta-brokers knowledge for each other, thus makes it appropriate for small-scale settings and not for large-scale inter-clouds.
In [20] is presented the problem of broker selection in multiple grid scenarios by describing and evaluating several scheduling techniques.In particular, system entities such as meta-brokers are represented as gateways.Authors claim that performance is not penalized significantly; however resource information accuracy may be lost.This work did not address these meta-scheduling features in inter-clouds.The work of [23] introduces a decentralized dynamic scheduling approach called community aware scheduling algorithm (CASA).The CASA contains a collection of sub-algorithms to facilitate service scheduling across distributed nodes.The message distribution is based on the probability to find a resource, thus requires training of the system to define probabilities.In this study, ICMS defines algorithms for dynamic scheduling that goes beyond exchanging local scheduling queues.In [33] authors present a scalable cloud system modeled around the Amazon EC2 architecture, with a workload model that offers fluctuating traffic characteristics.Table 1 shows a summary of large-scale scheduling approaches by extending the work in [32].In [6] authors present a theoretical comparison among these approaches in detail.Adaptability for dynamics cannot be guaranteed and privacy issues exposed due to queues exchanging (comparison of nodes load), and virtualization capabilities are not included.In [9], the scheduler redundantly distribute each job to multiple sites, instead of sending the job to the most lightly load though backfilling.
Increases the possibility of effective backfilling and brings better fairness.
Performs best for low loaded sites, lower overall performance for large-scale systems and no virtualization capability.In [10], the approach connects various Condor pools which yield to a self-organizing flock of condors.It schedules jobs to idle resources by using Condor resource manager and invokes flocking mechanism only for busy machines.
It uses the Condor resource manager for scheduling to idle resources and flocks can reduce the maximum job waiting time in the queue.
Pools are characterized to suitable/not suitable; as a result unfairness will lead to starvation, also comparison of queue lengths exposes privacy issues and virtualization is not determined.In [11], scheduling executed by site reputation and resource availability, and brokers schedule jobs through arrangements and priorities to peers where each peer can maintain ranking of all known peers.
Total decentralized solution where peers keep track of local balance for each known peer based on past interactions.
As debts grow, jobs become less prioritized, thus solution could lead to starvation.Also resources can be under-utilized due to metaschedulers bidding for specific resources.In [13], the work temporarily binds local resources to remote resources, when a user cannot be satisfied at the local level, through delegated matchmaking (DMM).Remote resources are added transparently.

Improved
performance by reducing administrative overhead, also no local operation of central control point.Dynamics of the system are ignored as a steady state is assumed during simulation.Also, heterogeneity and virtualization issues are not fully considered.In [18], the target is InterGrid infrastructure where authors interlink grid islands using peering arrangements and gateways to allow a cross collaboration among various grids.
It evaluates the performance of four complex algorithms and shows an improvement in average response times.
The system dynamics may affect connections of grid islands (e.g.failures could happen during communication) and also brokers are self-interested and not global.In [19], a meta-scheduler called GridWay sits on top of each grid infrastructure on the federated grid.Four algorithms have been developed and can be executed in the GridWay.
No requirements for information of remotes nodes and it considers past performance requirements to forecasts new objectives.
Only adoptable for specific information system as requires training mechanism for forecasting performance, also overhead during training may be increased.In [20], a meta-broker selection process is shown for multiple grid interoperating cases.The scheduling policy consists of the bestBrokerRank policy.
Improves workloads and resource utilization as well as load balancing among different grids.
The method assumes complete and detailed resource information sharing in a stable infrastructure.[23], shows a dynamic scheduling approach called CASA which functions as a scheduling decision to job schedule across decentralized distributed nodes.
Could lead to the same amount of executed jobs in centralized as in decentralized.
Job distribution is based on a probability to find a resource, thus requires training of the system to define probabilities.
An important characteristic of our approach is the message exchanging that is considered as a key requirement by most of the decentralized approaches, as reported in [9], [10], [11], [12], and [19].However, most of these approaches do not detail the whole request-response procedure.For example, [24] suggests that messages are exchanged among components in order to make cooperative scheduling decisions.Since the rejected responses are returned an increased message overhead is observed.
Similarly, [19] suggests an algorithm that allows rejected messages to return in the case a grid does not have the required slots for allocation.Authors in [10] suggest that a node that receives a message becomes aware of available resources in the pool.This includes messages that are exchanged in all the resources available in the resource pool.In contrast, [11] considers a broadcasting approach where a resource does not always require to reply back.However, majority of the current performance optimization approaches overlook the benefits that may derive from a more fine-grained message exchanging approach.

THE INTER-CLOUD META-SCHEDULING (ICMS) FRAMEWORK
The ICMS is the means to represent the inter-cloud service distribution that allows the integration of modular policies.The ICMS is organized in a layered structure as detailed in Figure 1.The primary functionalities are divided in three layers namely, the service submission, the distributed resource and the local resource management layers.In layer 1, a pre-defined topology includes users that forward requests to layer 2. The latter includes a random topology based on random interconnections of distributed meta-brokers (represented as nodes) to exchange services.The service distribution is based on messages that are exchanged among meta-brokers.The ICMS supports a dynamic workload management to allow decision-making for services distribution on the metabrokering level as detailed in [15].As indicated earlier, we focus on IaaS so each service encompasses a request for a Virtual Machine (VM) with regards to the computational power and other related parameters (i.e.number of CPUs, CPU cores, memory, storage, bandwidth).Each message includes a description of such information.Layer 3 contains a topology that involves the formation of low-level infrastructure and its entities such as local-brokers, data stores, hosts and VMs.It should be mentioned that our assumption is that clouds follow a standard setting (e.g.follow the open cloud computing interfaces) that includes a local resource broker that controls interactions with the datacenter hypervisor that in turn sandboxes it to a VM.Policies for scheduling and local resource management are implemented in the local resource broker.
As shown in figure 1, the layers include the key elements of the service life cycle that are to plan, deliver and operate.Layer 1, the service submission management layer, is responsible to create the service configuration by translating user requirements to system specification.The output is in a form that is recognized by the inter-cloud entities.Layer 2, the distributed resource management layer, collects service submissions and descriptions, extracts information regarding performance criteria (e.g.service size) and forwards it to the appropriate execution entity.This entity could be either a local resource queue or a remote meta-broker that further distributes the service to interconnected brokers.Layer 3, the local resource management layer, offers the service execution environment.Here, services are forwarded to the lowest level of the infrastructure (local resource management system -LRMS) and sandboxed in VMs.Prior to this, each service is queued into the LRMS queue where a scheduling algorithm allocates services to resources depending on the configuration of the scheduler (e.g.first come first service, shortest service first etc.) The whole ICMS is based on a group of modular policies and each of which realizes the layered structure and the dynamic requirements.Figure 2 illustrates the configuration of the four modules of the ICMS conceptual architecture, namely Service Request, Service Distribution, Service Availability and Service Allocation.Firstly, the "Service Request" module includes the user specification and the service formation process.Each service request is recorded into a service level agreement (SLA) representation.SLAs describe service requirements e.g.service CPU etc. along with a user policy for priorities or advance reservation mechanisms for prioritized users.The "Service Distribution" module contains the message distribution, the meta-brokering and the SLA policy as in layer 2. In addition, the module incorporates a mechanism for interpreting and translating the content of the SLA.The "Service Availability" module contains the SLA matchmaking, dynamic workload and local resource policies as in layer 3.This includes that each local-broker could define the internal resource usage (by evaluating current executions) in order to decide whether this is capable to execute the service locally.Finally, the "Service Allocation" module includes the hypervisor scheduler, the host allocation and VM allocation policies as in layer 4. The hypervisor is responsible for a) the sharing of host's computational power between the VMs (host scheduling), b) the sharing of VM allocation of computational units (VM allocation) and c) the management of the hypervisor that queues the services in hosts.
The communication between the ICMS modules is achieved by utilizing a novel message exchanging procedure that allows services to be exchanged as events that are sent and received between meta-brokers by following the Message Exchanging Optimization (MEO) model [14] [26].The assumption here is that we have a decentralized topology of meta-brokers to allow event request-response during regular time intervals.The following steps demonstrate that process.
1) The service distribution starts when a number of services are submitted to a meta-broker.Each service request contains a set of requirements such as time intervals (e.g.waiting time, interval etc.) and computational units (CPU, memory, bandwidth, etc.).In addition, each service request includes priorities and advance reservation features for allowing specific services to be executed on specific types of resources.2) Each service request is stored in a list.Each list row has a message with key characteristics including the deadline and the service length as the mean for calculating resource availability on remote resources.The service requests are dispatched during regular intervals.
3) The service requester defines the interval deadline, which defines a delay limit and the size of the list.
For large lists the deadline could be higher as the time needed to dispatch is higher.This also considers the cost of communication among entities.So a small deadline results in a small number of submissions, while a large one could lead to heavy submissions.The ICMS default interval is configurable to meet the needs of an experiment.Further details are provided in the experimental analysis section.4) The service requester collects addresses of interconnected nodes from an internal catalogue of resources.These nodes are meta-brokers that are used to receive the requests dispatched from the service lists.5) The service requester sends a service request as a message consisting of quality of service requirements (e.g.deadline, service length etc.).The message includes the ranking criteria (e.g.turnaround, energy consumption level); so all brokers will use the same resource selection criteria.It should be noted that identification tags define a message.During communication, the tags are set to unique values in order to characterize the group of messages.6) A broker collects a single service request and performs an internal resource availability check according to the ranking criterion.Then it generates a priority of services which is stored in a temporary ranking list.If the list is empty, the broker cannot execute the services and it will not respond back.In any other case, the list with the ranked services is forwarded back to service requester.7) Each service request is ranked according to a scheduling function and a decision is taken accordingly.8) In case of service availability (each service of the list can be executed locally) the broker generates a list with services.9) In case of non-availability (e.g.broker cannot execute all or few of the services contained in the list) a further service distribution request will be reperformed using steps 1 to 6. 10) In case of complete non-availability the broker will cease communication and therefore, responses are not sent back.11) A new list consisting of service requests which is ranked in a descending order is created.This forms the criteria for selecting services at the next resource management level.In the case that the broker acknowledges that the service request(s) will be executed on a remote machine, the broker re-directs messages to interconnected nodes.All messages are assigned with updated time deadlines.12) The ranked lists are collected from the service requester that compares and decides whether a remote resource will be accepted as the host for service execution or not.13) The procedure ends and each service request is sent to a local or remote resource.This concludes the steps of communication, in the next section we focus on the definition of the service submission and service execution features.

The definition of service submission in ICMS
Let assume that there are M meta-brokers that form a decentralized inter-cloud where } ,..., , { . Each meta-broker does not have a value but is associated with the name of interacted cloud.For instance cloud 1 has a meta-broker named as meta-broker1.The number n equals the number of participating clouds in an intercloud, thus each cloud has at least one meta-broker.Each service request is defined as j i and is assigned to a meta- broker m i and contains a number of physical characteris- tics named as CPU cpu i , memory mem i , cores cor i , storage stor i and bandwidth bw i .Each j i is a set of tuples where each request encapsulates a j i =( cpu i , cor i , mem i , stor i , bw i ).It should be mentioned that a service request is an IaaS encapsulation and it is defined in a similar manner to the Amazon EC2 service specification [30].The cpu i and the cor i define the clock rate as ClockRate i = cpu i *cor i .It includes also the cycles per in- structions for each service named as cpi j i to calculate the required execution time.This will help us to quantify the service size in terms of traditional jobs length.Further to this, the meta-broker defines a metric to characterize each submission, e.g. the cycles per instruction (CPI) and the execution time of the j i .The cpi j i is defined as cpi j i = cycles j i / instructions j i [6].The execution time exec j i is calculated as exec j i = instructions j i *cpi j i / ClockRate j i *10 5 .In this paper, we also define the millions of instruction per second ( mips j i ) to describe an additional service length metric calculated as mips j i = ClockRate j i / cpi j i j i *10 -6 .Both cpi j i and mips j i define the service size with regards to the specified user submission.Each meta-broker m i is as- signed with a latency lat m i that defines the delay of the broker to execute a service request including the time needed for coordination and internal communication.
The total service execution time is the sum of the latencies of the meta-broker m i to the execution time, TotalJobExecTime j i = Latency j i + exec j i .The latency of the m i .For example, the utilization of the meta-broker m i is the division of the throughput of the served jobs to the total throughput of the jobs that could be served.Finally, the service performance is described as the execution time of the VM that sandboxes the service and is calculated as

The algorithmic structure
Our approach includes request and response entities to implement the whole set of service execution life cycle.Figure 3 shows the relationships of the algorithms.It should be mentioned that events are the steps that happen in the life of cloud service requests.The process starts with the event formation and collection algorithms.Each event algorithm (along service configurations) is then ranked and distributed in the inter-cloud.The event assignment algorithm allocates the service computational units.The structure and the relationship of the algorithms follow the sequence below.Firstly, the event collection, formation and sending procedure of the request entity takes place.This is followed by the event gathering, identification of specification, ranking and response procedure of a broker.Finally the event redistribution procedure and the event collection, ranking and assignment processes take place.

The event formation algorithm
The process starts with the event-formation algorithm.
The ICMS sets a time interval int collection and a criterion for the events to be ranked at a later stage.The assumption is that the meta-brokers have the same uptime and are interlinked in a decentralized topology.Users submit service requests to one or many meta-brokers in an inter-cloud.
During a time interval int a for a submission a where int a £ in collection the meta-broker collects all the events in- as the operation to open a profile, size as the method to return the size value of the profile, send as the method to send a message to an address defined as ad, run as the method to run an algorithm, wait as the method to wait for an interval to expire and update as the method to update a list.

The event collection algorithm
This algorithm configures an interval value for collecting events from the source (e.g.users) and creates a list using the incoming service request specification.Initially, the algorithm sets a termination and redistribution flag (  nected meta-brokers.In addition it characterizes the responder meta-broker either as a termination point or as an intermediate node on communication.Algorithm 2 demonstrates the event collection procedure.The operations are defined as follows: decompose for a message decomposition operation, get for the collection procedure of service data, rank for the ranking procedure, set as the operation to set the required service specification, update as the method to update a list, size as the method to return the size value of the profile, send as the method to send a message to address ad, terminate as the method to terminate a message at the responder, destroy as the method to delete a list namely as Li at the responder, open as the method to open a profile, run as the method to execute an algorithm or an operation.The event collection algorithm facilitates the assembly procedure for incoming messages and the formation of the ranked list.The algorithm identifies messages for service delegation by identifying port tags (key: tag=q, for incoming message for requester and tag=w, for incoming message for further redistribution/decentralization).

The event ranking algorithm
The event ranking algorithm defines the criteria for service classification in the request or response from a meta-broker.To quantify such action we aim to minimize a function that calculates a set of metrics (known as rankings).In this paper, we define a number of parameters to calculate rankings such as: execution times, total times as well as energy consumption and service cost metrics as detailed in algorithm 3. The operation includes the size as the method to return the size value of the profile.

end if
The degree criterion defines the degree of the decentralized meta-broker topology as presented in [14], [26].In addition, we have implemented the consumption per entity cost for monitoring energy utilization (e.g. at datacentre and host level).At last, we included the cost functions for defining the message and delay cost.

The event redistribution algorithm
This algorithm describes the process of a meta-broker m n to redistribute the event request to its interconnected meta-brokers.The message redistribution algorithm implements the event relocation procedure in the case of further event dissemination.The procedure alters the tag values of messages and forwards each one to a node belonging to a personalized profile list.
For A key aspect is that the algorithm operates under the initial deadline value (of the requesting node) in order to be terminated in cases of interval violations.The algorithm allows messages to be forwarded only if there is no availability in the local resource pool.In this case, messages are reformed and transferred to remote entities for requesting resource availability according to a specific criterion.

The event assignment algorithm
The event assignment algorithm determines the next phase of the resource allocation.Here the events have been concluded and the service request is sandboxed in a VM.Algorithm 5 implements the allocation of services in entities (thus to their local hosts' scheduler).The algorithm collects the execution results after the completion of a service request.In particular for all service requests "j i allocates each of which to the LRMS.The operations include: set the tag to allocate the service to resource and the send (LRMS) to send procedure of service data into LRMS., it schedules the service requests and allocates host computational units based on a host allocation policy.Finally, it updates the current workload w i .This concludes the ICMS framework description.

SIMULATING THE INTER-CLOUD (SIMIC) TOOLKIT
This section illustrates the description of SimIC v2.0 (Simulating the Inter-Cloud version 2) that is a novel simulator used to implement the inter-cloud functionality.SimIC is a discrete event simulation toolkit based on the process oriented simulation package of SimJava [16].SimIC is used to simulate an inter-cloud facility where multiple clouds collaborate for service request distribution in a simulation setup.The package encompasses the ICMS algorithms including users, meta-brokers, localbrokers, datacenters, hosts, hypervisors and virtual machines (VMs).In SimIC, the message initialization begins at time instance 1, and then a message is created at state 1.After this, state 2 collects the message (get from out port State 1) and sends the message to in port State 3.During this time, the instance passes from time 2 to 3 and finally to time instance 4. Finally, the message is terminated (or initialized) from another state in order to continue the information exchanging.A more detailed discussion of the tool is presented in [16] that illustrate the layered structure and the processes of the tool.

The SimIC technical features
The SimIC has been developed using the Java™ 2 Platform (JDK 1.6).It is based on the process event simulation API of the SimJava version 2 distribution [6].Its high level structure is based on the entities of CloudSim [31].
We have extended these in order to implement metascheduling capabilities and batch job simulation.

The SimIC layered Architecture
The entities and their functionality are organized in a three-layer structure.This includes the entity layer, the queuing, behaviour and tagging layer, and the performance and tracing layer.
Specifically, layer 1 includes the entities representing the objects of the system.In a SimJava simulation, each feature is represented by a Sim_entity class that encapsulates the core functionality.Each SimIC class defines the actual behaviour (layer 2) of entities that are the ICMS resources.The core classes are User, Meta-broker, Localbroker, Datacenter, Hypervisor, Hosts, VMs and Bucket.The initialization process begins when a user starts communication with the meta-broker through a user interface.Like a meta-scheduling system.The latter acts on behalf of the user to forward the request to low level resources (local or remote sites).This procedure is executed by a local-broker.
Layer 2 represents the core features of SimIC including the utilization of ports, functionalities and constraints that demonstrate the actual behaviour of the system entities.Each class contains at least one port for input or output messages to other linked entities.In addition, it incorporates mechanisms for collecting messages, taking decisions (based on policies) and forwarding to an entity for request delegation and execution.Each entity is defined by constraints govern its actions.The actual communication is based on the tags that are assigned to messages during exchange.These tags are the means of identifying the origin of a message and the operations expected from a responder.Additionally, queuing refers to the orchestration of events (that are service messages) according to different LRMS (FCFS, SJF, PS).
Layer 3 relates to the performance monitoring and tracing operations of the system entities.The performance measures include execution time of the VM, turnaround time of service, makespan of the service, throughput of services in an entity, host utilization levels, VM utilization levels, service latencies and VM uptime times.Most of these metrics could be utilized by different entities in order to measure the performance of SimIC at different instances, for example throughput of a datacentre or latency at a hypervisor.

The SimIC entities
SimIC automates service request distribution among decentralized meta-brokers.Meta-brokers are placed on top of each cloud in order to communicate with other brokers to produce a distributed and interoperable cloud infrastructure (similar to grid computing).In SimIC each request is treated as unique.For example, a user requests for a VM, suppose with 0.25 of 1 host performance and executes a set of programs with 100*106 instructions, and CPI (cycles per instructions) of 3 (300 cycles /100 instructions) in a machine with clock rate of 1000 MHz (0.25 of 4000MHz of Host with single core).
The performance indicators of the VM are calculated as follows.
The  ).Specifically, each meta-broker is interconnected with one or more metabrokers depending on a simulation experiment.The Bucket class represents the terminal entity that collects the unexecuted services and logs service profile information.These could be either re-directed to an inter-cloud after a regular interval or terminated if there is an SLA mismatching.Termination and re-distribution flags ( f trm , f red ) are used to decide whether this is the termination or the re-distribution point.
The Cloud (local-broker) class defines an SLA matchmaking process for deciding whether the specification of user requirements could be addressed by a local resource.The datacentre's current performance is dynamically calculated for measuring the available computational power.This is realized by a messaging policy, as for all The HostCharacteristics class imports each specific host computational capacity as defined in a text file.The Hosts class represents a static computing machine.The class gets an event from Hyper for requesting an instance of the host characteristics.Eventually, this adds an additional delay to the hypervisor decision making process when allocating a VM.This is the latency of the host for starting the service execution.The VM class sandboxes the user profile.A more detailed discussion of the SimIC implementation is presented in [16].
To calculate the total communication delay of messages we split the latencies incurred at different stages of a service request execution life cycle in an inter-cloud.Thus, let's assume that a number of entities E with e n , the messaging factor (MF) defines a metric for the cost of message distribution.In a bi-directional graph formation this is calculated as the division of the sum of the messages received by the sum of the messaged sent as J is the maximum number of received messages.To conclude, this section presented SimIC, a new toolkit that allows system architects to configure a variety of interclouds in terms of entities and policies.The toolkit contains a number of scheduling algorithms and features for achieving configurable service execution.Figure 4 demonstrates the various actors and the interactions among the SimIC entities.A more detailed discussion and explanations of the various entities along with their relationships are presented in [16].Next section demonstrates an experimental analysis and evaluation of the ICMS.The experimental setup implements the messaging approach of [14] and involves the comparison of two approaches, namely a centralized inter-cloud (IC) and the decentralized ICMS model of inter-cloud being followed in this paper.In centralized approach the assumption is that there is a bi-directional communication among all nodes in a cloud.In this approach, we first focus on demonstrating that there is no experimental bias.We achieve this by running a number of tests, which show that a centralized IC does not affect cloud performance.Then we configure an ICMS based IC for service executions which is similar to the centralized setup.Finally, we show performance analysis of ICMS considering service request arrivals and load distributions in both static and dynamic modes.Our experimental results show that ICMS with dynamic workload management outperforms static mode when all resources are available.Our simulations implement the next experimental setting where 5 users submit requests in cycles, as it is shown in table 2. The hosts specification includes a total of 166 cores per cloud with an average of 10 3 MHz CPU, 10 GB RAM, 10 4 GB storage and 10 mbps bandwidth per host.

Cloud vs. inter-cloud settings
The first experiment aims to demonstrate that IC does not affect performance of a cloud, thus we compare with a similar hardware setup.We show that IC performs better than or equal to non-IC setting where both cases have exactly the same computational capacity.This means that a cloud that is non-IC based has exactly the same capacity (CPU, memory, storage and bandwidth) as with an IC based cloud that is made from two clouds, cloud 1 and cloud 2. The experimental analysis involves constant submissions of intervals of 2ms.Cloud 1 includes 5 users that submit 10 to 50 service requests.Each time a request arrives in the hypervisor, a new VM is generated according to the available resources.In both cases (cloud and IC) the utilization model of [5] is applied.This involves that resources will be allocated if they are available until the utilization reaches its peak (100%).IC distributes the jobs based on the MEO and centralized distribution approaches as discussed in [14].
In both cases (cloud and IC) all services are executed in local clouds, as there is no option for further service distribution.As the number of service requests increases, the IC will increase the makespan value due to the latency (set to 2ms) that is caused by the message exchanges.In particular, the study sets this value to a low number (2ms) as the assumption is that cloud 1 divided over two. Figure 5 shows the makespan values for 10 to 50 service requests per user to each cloud.Both trend lines show similar variations, which means that non-IC and IC follow similar makespan trends.Figure 6 shows the execution times (10 to 50 services) for both IC and non-IC cases.The average execution time for a single service request in the non-IC case is 5.79ms while for IC case is 5.38ms.This shows that IC achieves better execution time due to the better allocation of the resources.The improvement of IC is calculated at 7% (percentage of the division of the difference of higher to lower value, to the higher value).Figure 7 shows the average execution time and average utilization rates for both cases.It indicates that the average utilization of the IC case is 37.2% and the non-IC case is 35.4%.The average execution time shows decreasing value for the IC case.To conclude, IC increases utilization levels because it executes more service requests by decreasing the IC average execution times.The values are calculated based on the formulas of section 3.1 and is related to the throughput value of services.In detail, the values are relatively low due to the low number of service requests with respect to the cloud resources.

The inter-cloud vs. ICMS setting (1 service request submission per cloud)
We present two cases for 1 and 50 user submissions per cloud and we monitor the performance in both cases.The experiment demonstrates that ICMS performs better than or equal to the IC setting (with augmented datacenter view) with both having the same host configuration (5 clouds).This increase in performance is due to the service distribution and meta-scheduling approach being followed in the ICMS framework.In IC each meta-broker has a complete knowledge of the actual cloud infrastructure (e.g.datacenter characteristics, Hosts, VMs) as it communicates with other cloud brokers for information exchange.In contrast, the ICMS approach has a partial knowledge of the infrastructure and follows the decentralized message distribution as it is discussed in [14].This offers a higher level of abstraction for the entire cloud because a set of users are only mapped to a restricted set of meta-brokers at a time.

The inter-cloud vs. ICMS setting for 1 service submission per cloud
The experiment includes an IC of 5 clouds that have the same host specification with the ICMS The ICMS topology is considered as decentralized.Specifically we first assume that each cloud meta-broker can access the next in the list.For example, meta-broker 1 sends a service request to meta-broker 2, then meta-broker 2 to meta-broker 3, etc.
For each service request that is submitted, if cannot be executed in the local cloud, it is always forwarded to remote cloud(s).The availability is set so that each service can be executed in the next cloud (e.g.service 1 to cloud 2, service 2 to cloud 3 etc.).In the centralized case (IC) the assumption is that all clouds can access all other clouds directly.Figure 8 shows the makespan times for 1 service submission per user with 1ms interval.It is apparent that the values are decreasing for the case of ICMS.The average value is calculated to 519 ms, while the IC is measured to 534 ms; that shows an improvement of 15 ms in the average values.The improvement factor for this case is calculated to be 3%.Similarly to makespan values, the turnaround times for 1 service submission for both cases are as follows.The average value for ICMS is 524ms and for IC it is 539ms.On this basis, the centralized case increases the turnaround times mainly because of transferring services among datacen- ters.An improvement of 5% in turnaround times is observed in cloud 5 where the same submission requires 521ms in ICMS and 548.4ms in IC.Figure 9 shows the performance comparison of both cases in terms of response ratios.Specifically, the response ratio is calculated as the difference of the highest value of the metric from the lowest value of the same metric divided with the highest value e.g.(x -y) / x ´100%when x and y represent latency user-vm .The figure shows that the performance increases for ICMS as more users submit requests in a linear manner.Yet as requests are transferred to remote clouds (e.g.cloud 2, cloud 3 etc.) the ICMS performance decreases with regards to the performance as in the original ICMS setup.

The inter-cloud vs. ICMS setting for 50 service submissions per cloud
This experiment demonstrates the simulation results for high workload submissions (50) per user.As more services are exchanged resource availability becomes more limited and allocation management becomes more complex.In order for the results to be comparable the study takes into account clouds with exactly the same utilization levels (e.g. for this experiment clouds 3 and 4 offer the same utilization of 20% and clouds 2 and 5 with utilization of 6%).Services that cannot be executed due to non-resource availability or SLA mismatching are dropped, as the dynamic workload is inactive.This means that we do not re-schedule jobs to resources.Figure 10 shows that the makespan times for 50 services per user have slightly improved results for ICMS and clouds 3 and 4 (same utilization levels).The average makespan time for IC (clouds 3 and 4) is 639706ms while the same metric value for ICMS (clouds 3 and 4) is 638806ms (900ms difference).Figure 11 shows the makespan for clouds with low utilization of 6% (clouds 2, 5).Again, ICMS algorithms offer lower makespan times when compared to IC.To conclude, both cases (1 and 50 users) show that the ICMS achieves better makespan and turnaround times.This will affect the resource utilization and resource usage as the total scheduling and execution time of services is reduced.

The ICMS setting: low and high delays and 40% to 100% resource availability (dynamic workload management)
This experiment demonstrates the dynamic workload management for an ICMS case.The decentralized ICMS sends service requests to different clouds by incorporating dynamic distribution.This experiment executes requests having a combination of 1ms to 4ms delay and 40% to 100% resource availability.The percentage is related to the ability of a cloud to execute the specific service task; e.g. the 40% availability is selected as it demonstrates a cloud with low resource availability.The next list is a mixture of different combinations in the experiment.
i. 1-40%: delay 1 ms, availability 40% ii.*4-40%: delay 4 ms, availability 40% (where * indicates that delay is 4 times higher) iii.1-100%: delay 1 ms, availability 100% iv.*4-100%: delay 4 ms, availability 100% (where * indicates that delay is 4 times higher) Figure 12 shows the makespan times of ICMS for each of the four cases.It is shown that when the availability is 40% ICMS distributes service requests to all clouds; however in the case of 100% availability, cloud 4 executes most of the service requests.This is because of the high number of hosts that are available in cloud 4, which increase the available computational power.Figure 13 shows the overall resource utilization levels of clouds 1 to 5 for all four cases.Specifically the highest utilization is found in case 3 that details execution of all the services with high resource availability.It is also demonstrated that case 2, involving high delays, decreas-es the utilization levels (due to the increasing delay when continuous submissions occur).This experiment demonstrates the ICMS performance for a combination of user submissions.In the first case, 50 service requests are submitted to cloud 1, while in the second case 10 service requests are submitted to each cloud by a user (total of 50 service requests).Figure 14 shows the percentage of successful executions when comparing 1 user per cloud and all users in cloud 1.It is shown that the user requests distribution in different clouds offers better percentages of successful service requests execution.Thus, this shows that in an IC, the spreading of users in different clouds could assist in achieving higher percentages of successful executions.Figure 15 shows the makespan and turnaround times for services served by 5 clouds (1 to 5).It is shown that for high number of service request submissions, ICMS makes a better distribution by allocating resources more efficiently (based on the lower makespan times).In addition, turnaround times for higher workloads have been sufficiently decreased.For example, makespan times for cloud 2 shows an improvement rate of 4.9%.

CONCLUDING REMARKS AND FUTURE WORK
This work presents the ICMS, a framework that allows inter-cloud (IC) service distribution.We have developed this framework to address the issue of large scale service request distributions in IC that cannot be achieved from current approaches.Our experimental results support the following conclusions: (a) ICMS has an improved makespan time and reduced turnaround time, (b) ICMS outperforms standard IC in terms of remote cloud invocations and (c) ICMS improves performance each time a new service request is submitted to IC. Future directions involve the extension of SimIC in terms of VM migration policies.Further experiments with more clouds would have given a better reflection of the performance improvements.In addition, we aim to work on a message passing interface system for queuing host processors for information processing during interactions.We aim also to explore the security issues during communication between IC in order to enhance the effectiveness of our ICMS framework.Finally, a future research direction will be to test the system in terms of high variability in interintervals and service times in order evaluate probability based distribution of services in inter-cloud.

Figure 1 :
Figure 1: The three-layered structure of ICMS

Figure 2 :
Figure 2: The ICMS modular structure defines the time needed to communicate with the local resource to extract addresses for further distribution.Each meta-broker m i has a load of services and these are de- scribed as the throughput, where i m Troughpup is equal to the count of mi in the inter-cloud.The ICMS calculates the utilization of m i , e.g the usage levels,

Figure 3 :
Figure 3: The sequence diagram of the algorithmic model

i . For all j m i
it creates a list L i where each row contains the characteristics of the service request.The algorithm then sets a tag value represented as   i t and creates the message including the L i , t i and a performance criterion.The default ICMS configuration includes the total service execution time given makespan defines the time from start to finish, and the turnaroundtime is the total execution time of the schedule.The algorithm opens the profile of the entity prof i for i n prof m   it sends a message to the dedicated address.It sets an interval int dist that is the distribution interval time.For a time   i time where time i £ int dist compares the tag t i for validation and collect responses by a classification function.The latter is defined by the performance criterion of the previous step.As soon as the classification event concludes the algorithm updates the list L formation algorithm.The operations are defined as follows: get for the collection procedure of service data, set as the operation to set the required service specification, create for the operation to create a list, open ftrm , f red ) to recognize whether this is the termination point or the redistribution.For all "msg of the service requester and responder meta-broker.If req res int int  then it sets the tag to an indicator for returning messages (to perform validation).The broker sends the event back to the service requester by sending msg m i that includes the newly formed L i .In the case of sizef i = 0 it sets the f trm flag on, else it sets it to f red flag off.Specifically, for the first case the algorithm terminates the L i , while for the second case it opens the local profile prof i and runs the redistribution algorithm in order to find a new resource for service execution.This allows a decentralized behaviour of the ICMS as we assume that there are multiple levels of intercon- services in not i A the value to define assignment res the performance results of the service assignment 1. for all j  jset 2. set tag ← a 3. allocate (j,ad) 4. send(LRMS)5.end forThe procedure first collects the user service request from the SLA and selects the VM allocation policy according to the LRMS specification.The default queues implemented in ICMS are the First Come First Served (FCFS), Shortest Service Frist (SJF), Earliest Deadline First (EDF) and Priority Scheduling (PS).For all services LRMS i queue j   the hypervisor policy controls the current workload w i and calculates the total delay that includes the turnaround time and the hypervisor processing time TotalDelay = TurnTime + hypervisorDelay.Each service j i is queued into queue LRMS by adding a key i , j i as a pair.For an interval int i or for a specific queue length sizeQueue LRMS £ b where    resources, wherein each request is scheduled after a specific processing delay to a dedicated meta-broker.The Meta-broker class implements the interoperability functionality of SimIC (

i
and collects the list L i by running the performance criterion classification function that updates the list L i .This includes the validation process of tag t i .The OpenHost class imports each host characteristic from a text file by allowing SimIC to access host hardware characteristics while OpenHostsList opens a list from a text file that contains the individual hosts dedicated to a specific cloud.The Datacenter class accepts events for VMs deployment in the cloud that are determined by a hypervisor.This class implements functionality for calculating costs and energy consumption.It passes all events to a local policy enforcement engine.The Hyper class represents a hypervisor and is responsible for collecting requests for VMs from the datacenter class by accessing the host and VM allocation policies.This class queues each service j i into a queue LRMS by adding a key i , j i as a tuple.The Hyper-1939-1374 (c) 2015 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.This article has been accepted for publication in a future issue of this journal, but has not been fully edited.Content may change prior to final publication.Citation information: DOI 10.1109/TSC.2015.2399312,IEEE Transactions on Services Computing 10 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, MANUSCRIPT ID Call class generates an internal thread to release the services that have been scheduled in the queue according to the LRMS algorithm ( int i and sizeQueue LRMS £ b ).

Figure 4 :
Figure 4: Description diagram of the SimIC entitie5 Performance evaluation

Figure 7 :
Figure 7: Average execution time and average utilization for 10 to 50 services (non-IC vs. IC)

Figure 8 :
Figure 8: Makespan times for 1 service per user (IC vs. ICMS) 1939-1374 (c) 2015 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.This article has been accepted for publication in a future issue of this journal, but has not been fully edited.Content may change prior to final publication.Citation information: DOI 10.1109/TSC.2015.2399312,IEEE Transactions on Services Computing 12 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, MANUSCRIPT ID

Figure 9 :
Figure 9: Comparison of performance (response ratios) of ICMS-IC

Figure 10 :
Figure 10: Makespan times for 50 services per user for clouds with same utilization (clouds 3, 4)

Figure 11 :
Figure 11: Makespan times for 50 services per user for clouds with same utilization of 6% (clouds 2, 5)

Figure 13 :
Figure 13: Overall utilization levels for ICMS cases 5.4 The ICMS setting: A mixing of user submissions and 100% resource availability

Figure 14 :
Figure 14: Successful execution percentages for cases 5a and 5b

Figure 15 :
Figure 15: Makespan and turnaround times for both cases

Table 1 :
[8]mary of large-scale scheduling approachesApproachAdvantages Disadvantages[8]and[29]demonstrate ELISA and MELISA that calculate the neigbouring nodes load by considering job arrival rate and node loads.Jobs are transferred based on the comparison of nodes load and not queue length.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited.Content may change prior to final publication.Citation information: DOI 10.1109/TSC.2015.2399312,IEEE Transactions on Services Computing 1939-1374 (c) 2015 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited.Content may change prior to final publication.Citation information: DOI 10.1109/TSC.2015.2399312,IEEE Transactions on Services Computing 1939-1374 (c) 2015 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
. The performance of the VM is calculated at 3.33 based on Performance VM =1/ ExecTime VM = 3.33.Next, we present a

Table 2 :
The SimIC configuration