This is an experimental format for representing DNS information in CBOR with the goals to:
- Be able to stream the information
- Support incomplete, broken and/or invalid DNS
- Have close to no data quality and signature degradation
- Support additional non-DNS meta data (such as ICMP/TCP attributes)
In CBOR you are expected to have one root element, most likely an array or map. This format does not have a root element, instead you are expected to read one CBOR array element at a time as a stream of CBOR elements with the first array element being the stream initiator object.
[stream_init]
[message]
...
[message]
Here are some number on the compression rate compared to PCAP:
| Uncompressed | PCAP | CDS | Factor |
|---|---|---|---|
| client | 458373 | 133640 | 0,2915 |
| zonalizer | 51769844 | 9450475 | 0,1825 |
| large ditl | 1003931674 | 298167709 | 0,2970 |
| small ditl | 1651252 | 603314 | 0,3653 |
| Gzipped | PCAP | CDS | Factor | F/Uncompressed |
|---|---|---|---|---|
| client | 108136 | 45944 | 0,4248 | 0,1002 |
| zonalizer | 12468329 | 2485620 | 0,1993 | 0,0480 |
| large ditl | 327227203 | 117569598 | 0,3592 | 0,1171 |
| small ditl | 539323 | 253402 | 0,4698 | 0,1534 |
| Xzipped | PCAP | CDS | Factor | F/Uncompressed |
|---|---|---|---|---|
| client | 76248 | 36308 | 0,4761 | 0,0792 |
| zonalizer | 7894356 | 1695920 | 0,2148 | 0,0327 |
| large ditl | 267031412 | 86747604 | 0,3248 | 0,0864 |
| small ditl | 442260 | 206596 | 0,4671 | 0,1251 |
clientis a couple of hours of DNS from my workstationzonalizeris half a day from Zonalizer which continuously tests gTLDslarge ditl,small ditlare capture from DITL
int: A CBOR integer (major type 0x00)uint: A CBOR integer (value >= 0, major type 0x00)nint: A CBOR negative integer (value < 0, major type 0x00), this type has special meaning seeNegative Integerssimple: A CBOR simple value (major type 0xe0)bytes: A CBOR byte string (major type 0x40)string: A CBOR UTF-8 string (major type 0x60)any: Any CBOR valuebool: A CBOR booleanrindex: A CBOR negative integer that is a reverse index, seeDeduplication
union: Can be used to merge the given array or map into the current objectoptional: The attribute or object reference is optional
CBOR encodes negative numbers in a special way and this format uses that for none negative number to tell them apart.
Because of that, all negative numbers needs special decoding:
value = -value - 1
The object code below uses:
[and]to indicate the start and end of an arraytype nameper object attributenameper object reference...to indicate a list of previous definition(,|and)to indicate list of various types that the attribute can be
The initial object in the stream.
[
string version,
union stream_option option,
...
]
version: The version of the formatoption: A list of stream option objects
A stream option that can specify critical information about the stream and
how it should be decoded, see Stream Options for more information.
[
uint option_type,
optional any option_value
]
option_type: The type of option represented as a numberoption_value: The option value
A message object that describes various DNS packets or other information.
[
optional bool is_complete,
union timestamp timestamp,
simple message_bits,
union ip_header ip_header,
union ( icmp_message | udp_message | tcp_message | dns_message ) content
]
is_complete: Will exist and be false if the message is not complete and following attributes may not existstimestamp: A timestamp objectmessage_bits: Bitmap indicating message content- Bit 0: 0=Not DNS 1=DNS
- Bit 1: if DNS: 0=UDP 1=TCP else: 0=ICMP/ICMPv6 1=TCP
- Bit 2: Fragmented (0=no 1=yes)
- Bit 3: Malformed (0=no 1=yes)
ip_header: An IP header objectcontent: The message content, may be an ICMP, UDP, TCP or DNS message object
The timestamp object of a message.
[
( uint seconds | nint diff_from_last ),
optional uint useconds
optional uint nseconds
]
seconds: The seconds of a UNIX timestampdiff_from_last: The differentially from lasttimestamp.secondsuseconds: The microseconds of a UNIX timestamp or ifdiff_from_lastis used it will be the differentially from lasttimestamp.usecondsnseconds: The nanoseconds of a UNIX timestamp or ifdiff_from_lastis used it will be the differentially from lasttimestamp.nseconds
The IP header of a message.
[
( uint | nint ) ip_bits,
optional bytes src_addr,
optional bytes dest_addr,
optional ( uint | nint ) src_dest_port
]
ip_bits: Bitmap indicating IP header content, if the type isnintit also indicates that it is a reverse from last, seeDeduplicationfor more information- Bit 0: address family (0=AF_INET, 1=AF_INET6)
- Bit 1: src_addr present
- Bit 2: dest_addr present
- Bit 3: port present
src_addr: The source address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6dest_addr: The destination address with length specifying address family, 4 bytes is IPv4 and 16 is IPv6src_dest_port: A combined source and destination port, seeSource And Destination Port
The source and destination port are combined into one value. If both source and destination exists then the value is larger then 65535, the destination will be the high 16 bits and source the low otherwise it will only be the source. If the value is negative then only the destination exists.
if value > 0xffff then
src_port = value & 0xffff
dest_port = value >> 16
else if value < 0 then
dest_port = -value - 1
else
src_port = value
if ip_header.ip_bits.1=0 && ip_header.ip_bits.2=0
[
uint type,
uint code
]
type: TODOcode: TODO
if ip_header.ip_bits.1=1 && ip_header.ip_bits.2=0
TODO
if ip_header.ip_bits.2=1
[
uint seq_nr,
uint ack_nr,
uint tcp_bits,
uint window
]
seq_nr: TODOack_nr: TODOtcp_bits: TODO- 0: URG
- 1: ACK
- 2: PSH
- 3: RST
- 4: SYN
- 5: FIN
window: TODO
A DNS packet.
[
optional bool is_complete,
uint id,
uint raw_dns_header, # TODO
optional nint count_bits,
optional uint qdcount,
optional uint ancount,
optional uint nscount,
optional uint arcount,
optional simple rr_bits,
optional [
dns_question question,
...
],
optional [
resource_record answer,
...
],
optional [
resource_record authority,
...
],
optional [
resource_record additional,
...
],
optional bytes malformed
]
is_complete: Will exist and be false if the message is not complete and following attributes may not existsid: DNS identifierraw_dns_header: TODOcount_bits: Bitmap indicating which counts are present, seeNegative IntegersandDeduplication- Bit 0: qdcount present
- Bit 1: ancount present
- Bit 2: nscount present
- Bit 3: arcount present
qdcount: Number of question records if different from the number of entries inquestionancount: Number of answer resource records if different from the number of entries inanswernscount: Number of authority resource records if different from the number of entries inauthorityarcount: Number of additional resource records if different from the number of entries inadditionalquestion: The question recordsanswer: The answer resource recordsauthority: The authority resource recordsadditional: The additional resource recordsmalformed: Holds the bytes of the message that was not parsed
A DNS question record.
[
optional bool is_complete,
( bytes | compressed_name | rindex ) qname,
optional uint qtype,
optional nint qclass
]
is_complete: Will exist and be false if the message is not complete and following attributes may not existsqname: The QNAME as byte string, a name compression object or a reverse index, seeDeduplicationqtype: The QTYPE, seeDeduplicationqclass: The QCLASS, seeNegative IntegersandDeduplication
An compressed name which has references to other labels within the same message.
[
( bytes label | uint label_index | nint offset | simple extension_bits ),
...
]
label: A byte string with a label partlabel_index: An index to the N byte string label in the messageoffset: The offset specified in the DNS message which could not be translated into a label indexextension_bits: The extension bits if not 0b00 or 0b11 # TODO: add the extension bits
A DNS resource record.
[
optional bool is_complete,
( bytes | compressed_name | rindex ) name,
optional simple rr_bits,
optional uint type,
optional uint class,
optional uint ttl,
optional uint rdlength,
( bytes | mixed_rdata ) rdata
]
is_complete: Will exist and be false if the message is not complete and following attributes may not existsname:rr_bits: Bitmap indicating what is present, seeDeduplication- Bit 0: type
- Bit 1: class
- Bit 2: ttl
- Bit 3: rdlength # TODO: reverse index for TTL?
type: The resource record typeclass: The resource record classttl: The resource record ttlrdlength: The resource record rdata lengthrdata: The resource record data
An array mixed with resource data and compressed names.
[
( bytes | compressed_name ) rdata_part,
...
]
rdata_part: The parts of the resource records data
Each option is specified here as OptionName(OptionNumber) and optional OptionValue type.
RLABELS(0) uint: Indicates how many labels should be stored in the reverse label index before discarding themRLABEL_MIN_SIZE(1) uint: The minimum size a label must be to be put in the reverse label indexRDATA_RINDEX_SIZE(2) uint: Indicates how many rdata should be stored in the reverse rdata index before discarding themRDATA_RINDEX_MIN_SIZE(3) uint: The minimum size a rdata must be to be put in the reverse rdata indexUSE_RDATA_INDEX(4): If present then the stream uses rdata indexingRDATA_INDEX_MIN_SIZE(5) uint: The minimum size a rdata must be to be put in the rdata index
Deduplication is done in a few different ways, data may be left out to indicate that it is the same as the previous value, an index may be used to indicate that it is the same as the N previous value and a reverse index may be used to indicate that it is the N previous value looking backwards across the stream.
In other words, using the index deduplication you will need to build a table of the values you come across during the decoding of the stream, this table can grow very large.
As an smaller alternative a reverse index can indicate often used data from the N previous value looking back over the stream. This type of index also reorder itself to try and put the most used data always in the index.
TODO: details of each attribute and it's deduplication