I've been using apt-smart and noticed that the mirror ranking algorithm may not optimally reflect real-world performance differences. The current implementation appears to give disproportionate weight to small differences in update time compared to massive differences in bandwidth.
|
def sort_key(self): |
|
""" |
|
A tuple that can be used to sort the mirror by its availability/performance metrics. |
|
|
|
The tuple created by this property contains four numbers in the following order: |
|
|
|
1. The number 1 when :attr:`is_available` is :data:`True` or |
|
the number 0 when :attr:`is_available` is :data:`False` |
|
(because most importantly a mirror must be available). |
|
2. The number 0 when :attr:`is_updating` is :data:`True` or |
|
the number 1 when :attr:`is_updating` is :data:`False` |
|
(because being updated at this very moment is *bad*). |
|
3. The negated value of :attr:`last_updated` (because the |
|
lower :attr:`last_updated` is, the better). If :attr:`last_updated` |
|
is :data:`None` then :data:`LAST_UPDATED_DEFAULT` is used instead. |
|
4. The value of :attr:`bandwidth` (because the higher |
|
:attr:`bandwidth` is, the better). |
|
|
|
By sorting :class:`CandidateMirror` objects on these tuples in |
|
ascending order, the last mirror in the sorted results will be the |
|
"most suitable mirror" (given the available information). |
|
""" |
|
return (int(self.is_available), |
|
int(not self.is_updating), |
|
-(self.last_updated if self.last_updated is not None else LAST_UPDATED_DEFAULT), |
|
self.bandwidth or 0) |
The sort_key method in CandidateMirror uses 4 factors with equal weighting, this can lead to situations where:
------------------------------------------------------------------------------------------------------
| Rank | Mirror URL | Available? | Updating? | Last updated | Bandwidth |
------------------------------------------------------------------------------------------------------
| 1 | http://mirror.nyist.edu.cn/ubun... | Yes | No | 2 hours behind | 257.03 KB/s |
| 2 | http://ports.ubuntu.com | Yes | No | 2 hours behind | 75.25 KB/s |
| 3 | https://mirror.nyist.edu.cn/ubu... | Yes | No | 2 hours behind | 32.02 KB/s |
| 4 | http://mirrors.aliyun.com/ubuntu | Yes | No | 3 hours behind | 3.84 MB/s |
| 5 | https://mirrors.sdu.edu.cn/ubuntu | Yes | No | 3 hours behind | 514.34 KB/s |
| 6 | http://mirrors.sdu.edu.cn/ubuntu | Yes | No | 3 hours behind | 391.18 KB/s |
A mirror with 32 KB/s bandwidth and 2-hour delay ranks higher than
A mirror with 3.84 MB/s bandwidth and 3-hour delay
The 1-hour difference in update time creates a larger sorting impact than the 2-order-of-magnitude difference in bandwidth. A mirror source with 32KB/s can hardly be considered very usable.
Assigning different weights to various factors could potentially serve as a solution for apt-smart
I've been using apt-smart and noticed that the mirror ranking algorithm may not optimally reflect real-world performance differences. The current implementation appears to give disproportionate weight to small differences in update time compared to massive differences in bandwidth.
apt-smart/apt_smart/__init__.py
Lines 1125 to 1150 in 870876d
The
sort_keymethod inCandidateMirroruses 4 factors with equal weighting, this can lead to situations where:A mirror with 32 KB/s bandwidth and 2-hour delay ranks higher than
A mirror with 3.84 MB/s bandwidth and 3-hour delay
The 1-hour difference in update time creates a larger sorting impact than the 2-order-of-magnitude difference in bandwidth. A mirror source with 32KB/s can hardly be considered very usable.
Assigning different weights to various factors could potentially serve as a solution for
apt-smart