2 3 4 in mcm database in scheduler database 5
DESCRIPTION
3TRANSCRIPT
Workflows Scheduler for Monte Carlo
samples production at CMS
Author: Julius SkripkauskasVilnius University
Mathematics and Informatics Faculty
Supervisors: Jean-Roch Vlimant,Giovanni Franzoni
Project description• Timetable is displayed for a user to check position of
requests and interact with variety of configurations.• The goal is predicting the time of completion of single
samples as well as of production campaign, taking into account the several concurrent production campaign and the CMS computing resources constraints.
2
3
What is MC production request?• MC production requests – Monte Carlo samples production.• Submitted by physicists.• Each request has its info, like: events number, keywords, time it
takes to compute an event. “Event” is main element of request.• Single event is a single simulation of some type of particle
collision.• Request is N number of same type of simulations.• Single request is displayed as single block in timetable.• Requests are kept in MCM database.• MCM (Monte Carlo Management) a replacement for PREP for
sample request management.
4
Requests data format• Attributes of converted request (request data in
scheduler database):• Id• Width – time it takes to complete request.• Height – number of events per time unit.• Type:
• Priority – Requests scheduled by importance.• Deadline – Requests scheduled by date until they have to be finished and
importance.• Keywords• Group• Source
5
Requests data formatIn MCM
databaseIn
Scheduler database
6
Scheduler• Schedules requests and visualizes them in graphical
environment.• Allows MC product managers to predict when the
production of certain sample will be ready.• Predicting the evolution of the production necessary,
because of the limited, distributed resources of CMS.• Sample production uses resources from different
clusters in different Tiers, one of the inputs in scheduler is such number of resources available.
7
Main parts of schedulerMC
production requests
Scheduling Views
Conversion of requests data
Scheduling of converted
requests data
Displaying scheduled
data
8
Data pathsDifferent sources
Translator
Database
Scheduler
Web page
Data sources either a database like MCM with non-converted data in it or csv file uploaded in a server.
Conversion of requests data from different sources. Data either saved in database or given directly to scheduler depending on
source of data.
Database of converted requests data from other databases like MCM.
Scheduling of converted data to be displayed in a timetable.
Interactive web interface, displays timetable and allows to pass different configurations or more data to scheduler.
9
Scheduler - algorithm• Previously developed by Štěpán Balcar.• The algorithm does not have to find the best solutions.• Timetable is represented as permutation inspired to Smith
Evolution Algorithm.• Permutation determines the order in which tasks will be
sequentially inserted.• Inserter function tries to insert tasks into all positions in free space.• Available positions are sorted according to the direction in which is
done insertion - inspired Hwang et algorithm.• The asymptotic complexity is: O(N * log (N) + A*N) • A = number of operation needed to insert one Block.
10
Scheduler – algorithm(1)
11
Scheduler – algorithm(1)• First insert requests with deadline.• Deadlines do not exist yet.• Insertion from bottom right corner into area which is
bounded by deadline.• One by one in ascending order of priorities.
12
Scheduler - algorithm(2)
13
Scheduler – algorithm(2)• Insert priority blocks into free spaces (empty space
between deadlines).• Insertion from bottom left corner.• One by one in order of priorities descending.
14
Scheduler – older version• Developed by Štěpán Balcar.• Fake data - did not reflect real data well enough.• Fully implemented scheduling algorithm (placing
request blocks in timetable).• Almost no interactivity or modification in scheduler.• Defined format of converted requests data.
15
Scheduler – new versionY axis
X axis
Coloring
Developed by Julius Skripkauskas https://cms-pdmv.cern.ch/scheduler/
16
Scheduler – new version• X axis – period of time, scheduled requests (colorful blocks in
timetable) have width in days, hours, minutes that occupy part of X axis. X axis can be modified to display longer or shorter period of time by choosing dates from dropdown list.• Y axis – available resources (slots, processing power) for
computing of requests. Default value is ~86000 slots, y axis (number of slots) can be modified by inserting number into slots input field.• Coloring – Recoloring of already scheduled data by desired
options. Option to recolor timetable chosen by clicking on button with option name on it. Additionally list of color labels is displayed to distinguish among variety of colors and requests.
17
Scheduler – new versionColored by Member of Campaign option
Label shows that “Spring14dr” is brown, requests with “Spring14dr” are also colored brown.
18
Scheduler – new versionDrop down lists to change displayed part of X axis.
Input fields:“Slots” – number of slotsavailable, after enteringnumber and clicking “Redraw”everything is recalculated with new Y axis.
“Keywords” – filtering requestsby keywords in them, after entering keywords like status“new” and clicking “Redraw” everything is recalculated justby using requests that have that keyword.
Data source configurationwindow – allows to upload data within csv files to server.Csv data uploaded with labelprovided by user.Source of data may be configured by user by clicking on a checkbox.After configuration and reloadof page data is scheduled anddisplayed from previouslychosen sources.
19
Scheduler – filtering by keywords example
Scheduled by filteringrequest only to those of“Spring14dr” campaign.As we see production ofall requests of “Spring14dr”takes about 4-5 days.
20
Scheduler – new version• Real data – requests from MCM database in other words input from
actual production status of CMS, also csv files.• A lot of faulty data discarded.• A lot more interactivity and modification allowed.
• Data scheduled on the fly, no scheduling on past dates.• User chooses range of scheduler to be displayed (by specifying start and end
dates).• Possibility to choose different coloring besides coloring by priority.• Ability to filter by keywords (prepid, energy, source, status, etc.).• Functionality to modify number of slots available.• Ability to upload either temporary or permanent requests data in csv files.• Simple interface for configuration of source list (which sources to be used).• Variety of tooltips and performance improvements.
21
Improvements• Possibility to allow more modifications, for example to
modify single request information.• Improve scheduling algorithm, perhaps calculating
and splitting number of priority type requests that fit into each free interval, and then scheduling at the same time by using multi-threading.• Improve graphical design of scheduler, hire a designer.
22