BEGIN:VCALENDAR
VERSION:2.0
PRODID:www.dresden-science-calendar.de
METHOD:PUBLISH
CALSCALE:GREGORIAN
X-MICROSOFT-CALSCALE:GREGORIAN
X-WR-TIMEZONE:Europe/Berlin
BEGIN:VTIMEZONE
TZID:Europe/Berlin
X-LIC-LOCATION:Europe/Berlin
BEGIN:DAYLIGHT
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
DTSTART:19810329T030000
RRULE:FREQ=YEARLY;INTERVAL=1;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
DTSTART:19961027T030000
RRULE:FREQ=YEARLY;INTERVAL=1;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
UID:DSC-13025
DTSTART;TZID=Europe/Berlin:20170616T160000
SEQUENCE:1497599825
TRANSP:OPAQUE
DTEND;TZID=Europe/Berlin:20170616T170000
URL:https://www.dresden-science-calendar.de/calendar/en/detail/13025
LOCATION:TUD Andreas-Pfitzmann-Bau\, Nöthnitzer Straße 4601069 Dresden
SUMMARY:Domke: Routing on the Channel Dependency Graph: A New Approach to D
 eadlock-Free\, Destination-Based\, High-Performance Routing for Lossless I
 nterconnection Networks
CLASS:PUBLIC
DESCRIPTION:Speaker: Dipl.-Math. Jens Domke\nInstitute of Speaker: \nTopics
 :\nInformatik\n Location:\n  Name: TUD Andreas-Pfitzmann-Bau (APB 1004 (Ra
 tssaal))\n  Street: Nöthnitzer Straße 46\n  City: 01069 Dresden\n  Phone
 : \n  Fax: \nDescription: In the pursuit for ever-increasing compute power
 \, and with Moore's law slowly coming to an end\, high-performance computi
 ng started to scale-out to larger systems. Alongside the increasing system
  size\, the interconnection network is growing to accommodate and connect 
 tens of thousands of compute nodes. These networks have a large influence 
 on total cost\, application performance\, energy consumption\, and overall
  system efficiency of the supercomputer. Unfortunately\, state-of-the-art 
 routing algorithms\, which define the packet paths through the network\, d
 o not utilize this important resource efficiently. Topology-aware routing 
 algorithms become increasingly inapplicable\, due to irregular topologies\
 , which either are irregular by design\, or most often a result of hardwar
 e failures. Exchanging faulty network components potentially requires whol
 e system downtime further increasing the cost of the failure. This managem
 ent approach becomes more and more impractical due to the scale of today's
  networks and the accompanying steady decrease of the mean time between fa
 ilures. Alternative methods of operating and maintaining these high-perfor
 mance interconnects\, both in terms of hardware- and software-management\,
  are necessary to mitigate negative effects experienced by scientific appl
 ications executed on the supercomputer. However\, existing topology-agnost
 ic routing algorithms either suffer from poor load balancing or are not bo
 unded in the number of virtual channels needed to resolve deadlocks in the
  routing tables. Using the fail-in-place strategy\, a well-established met
 hod for storage systems to repair only critical component failures\, is a 
 feasible solution for current and future HPC interconnects as well as othe
 r large-scale installations such as data center networks. Although\, an ap
 propriate combination of topology and routing algorithm is required to min
 imize the throughput degradation for the entire system. This thesis contri
 butes a network simulation toolchain to facilitate the process of finding 
 a suitable combination\, either during system design or while it is in ope
 ration. On top of this foundation\, a key contribution is a novel scheduli
 ng-aware routing\, which reduces fault-induced throughput degradation whil
 e improving overall network utilization. The scheduling-aware routing perf
 orms frequent property preserving routing updates to optimize the path bal
 ancing for simultaneously running batch jobs. The increased deployment of 
 lossless interconnection networks\, in conjunction with fail-in-place mode
 s of operation and topology-agnostic\, scheduling-aware routing algorithms
 \, necessitates new solutions to solve the routing-deadlock problem. There
 fore\, this thesis further advances the state-of-the-art by introducing a 
 novel concept of routing on the channel dependency graph\, which allows th
 e design of an universally applicable destination-based routing capable of
  optimizing the path balancing without exceeding a given number of virtual
  channels\, which are a common hardware limitation. This disruptive innova
 tion enables implicit deadlock-avoidance during path calculation\, instead
  of solving both problems separately as all previous solutions.
DTSTAMP:20260523T061951Z
CREATED:20170602T080513Z
LAST-MODIFIED:20170616T075705Z
END:VEVENT
END:VCALENDAR