Data Science Events
CS Colloquium: Sebastian Wild (Liverpool)
Speaker: Sebastian WildLink opens in a new window (University of Liverpool)
Title: Quicksort, Timsort, Powersort: Algorithmic ideas, engineering tricks, and trivia behind CPython’s new sorting algorithm
Abstract:
In 2002, Tim Peters invented Timsort, a clever Mergesort variant, for the CPython reference implementation of the Python programming language. Timsort is both effective in Python and a popular export product: it is used in many languages and frameworks, notably OpenJDK, the Android runtime, and the V8 JavaScript engine.
Despite its widespread use, algorithms researchers eventually discovered two flaws in Timsort's underlying algorithm: The first could lead to a stack overflow in CPython and Java; although meanwhile fixed, it exposed conceptual intricacies of the algorithm. The second flaw concerns its performance: the order in which detected sorted segments, the “runs” of the input, are merged, can be 50% more costly than necessary.
Based on ideas from optimal alphabetic trees, our Powersort merge policy finds nearly optimal merging orders with negligible overhead and using a much more transparent rule. Since Python 3.11, it is part of the CPython reference implementation.
Bio:
Sebastian Wild is a Lecturer (Assistant Professor) at the Department of Computer Science of the University of Liverpool where he works on space-efficient data structures, analysis of algorithms, and engineering of foundational algorithms. Before joining Liverpool, he was a Postdoctoral Fellow at the University of Waterloo. His PhD on the analysis of multiway quicksort, obtained from the University of Kaiserslautern, received the GI Dissertationspreis 2016.