Mirror ranking algorithm overweights update time vs bandwidth differences

I've been using apt-smart and noticed that the mirror ranking algorithm may not optimally reflect real-world performance differences. The current implementation appears to give disproportionate weight to small differences in update time compared to massive differences in bandwidth.

https://github.com/martin68/apt-smart/blob/870876dd6ba58929ea1bc8f4363a5ade4bad3938/apt_smart/__init__.py#L1125-L1150

The `sort_key` method in `CandidateMirror` uses 4 factors with equal weighting, this can lead to situations where:

```
------------------------------------------------------------------------------------------------------
| Rank | Mirror URL                         | Available? | Updating? | Last updated    | Bandwidth   |
------------------------------------------------------------------------------------------------------
|    1 | http://mirror.nyist.edu.cn/ubun... | Yes        | No        | 2 hours behind  | 257.03 KB/s |
|    2 | http://ports.ubuntu.com            | Yes        | No        | 2 hours behind  | 75.25 KB/s  |
|    3 | https://mirror.nyist.edu.cn/ubu... | Yes        | No        | 2 hours behind  | 32.02 KB/s  |
|    4 | http://mirrors.aliyun.com/ubuntu   | Yes        | No        | 3 hours behind  | 3.84 MB/s   |
|    5 | https://mirrors.sdu.edu.cn/ubuntu  | Yes        | No        | 3 hours behind  | 514.34 KB/s |
|    6 | http://mirrors.sdu.edu.cn/ubuntu   | Yes        | No        | 3 hours behind  | 391.18 KB/s |
```

A mirror with 32 KB/s bandwidth and 2-hour delay ranks higher than
A mirror with 3.84 MB/s bandwidth and 3-hour delay

The 1-hour difference in update time creates a larger sorting impact than the 2-order-of-magnitude difference in bandwidth. A mirror source with 32KB/s can hardly be considered very usable.

Assigning different weights to various factors could potentially serve as a solution for `apt-smart`

	def sort_key(self):
	"""
	A tuple that can be used to sort the mirror by its availability/performance metrics.

	The tuple created by this property contains four numbers in the following order:

	1. The number 1 when :attr:`is_available` is :data:`True` or
	the number 0 when :attr:`is_available` is :data:`False`
	(because most importantly a mirror must be available).
	2. The number 0 when :attr:`is_updating` is :data:`True` or
	the number 1 when :attr:`is_updating` is :data:`False`
	(because being updated at this very moment is bad).
	3. The negated value of :attr:`last_updated` (because the
	lower :attr:`last_updated` is, the better). If :attr:`last_updated`
	is :data:`None` then :data:`LAST_UPDATED_DEFAULT` is used instead.
	4. The value of :attr:`bandwidth` (because the higher
	:attr:`bandwidth` is, the better).

	By sorting :class:`CandidateMirror` objects on these tuples in
	ascending order, the last mirror in the sorted results will be the
	"most suitable mirror" (given the available information).
	"""
	return (int(self.is_available),
	int(not self.is_updating),
	-(self.last_updated if self.last_updated is not None else LAST_UPDATED_DEFAULT),
	self.bandwidth or 0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mirror ranking algorithm overweights update time vs bandwidth differences #25

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Mirror ranking algorithm overweights update time vs bandwidth differences #25

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions