In the realm of modern network security and asset management, the art of identifying operating systems within organisations has become critical. This fundamental task serves as the bedrock upon which IT and security departments build their fortresses of vigilance and control. However, it’s a challenge that takes on various complexities, especially when it comes to a multitude of operating system (OS) types, particularly those within the realm of embedded and IoT devices, which are often unmanaged and unsuitable for hosting the conventional software agents that facilitate OS identification.
Fortunately, in the ever-evolving landscape of cybersecurity, innovation has paved the way for a more passive approach, one that negates the need for intrusive software installations on endpoint devices and is adaptable to a wide array of OSs. This ingenious method is none other than “passive OS fingerprinting.” It is a smart way to figure out what operating system a computer is using. It works by carefully studying special patterns in the data that the computer sends over a network. Surprisingly, it can often determine this just from one piece of data, instead of needing lots of back-and-forth communication between two computers.
In passive OS fingerprinting, many protocols from different network layers have become incredibly useful for figuring out which operating system is being used.
Starting with medium access control (MAC)
Starting at the lowest layer of the stack, known as the data link layer, there’s something called the MAC protocol. This protocol assigns a special physical ID known as the MAC address to each network device’s network interface card (NIC). This address or organisationally unique identifier (OUI) is built into the device when it’s made and consists of 12 numbers and letters (hexadecimal digits). Usually, they are shown as six pairs with hyphens in between. The first six digits on the left side tell you the manufacturer of the device and the last six digits on the right side are like its serial number. When it comes to identifying the OS, we can use the manufacturer’s ID (the first six digits) to deduce what kind of device it is and sometimes even infer it’s running.
Moving up the stack
As we ascend the protocol stack into the network and transport layers, a richer source of data: the TCP/IP stack is exposed. Delving into the realm of OS identification through TCP/IP data hinges on the intriguing variability found within the TCP and IP protocols. These protocols include specific parameters within the packet’s header segment, and it’s noteworthy that different OSs tend to make distinctive choices for these parameters, providing valuable clues for fingerprinting.
Some of the most common parameters used in OS fingerprinting are initial time to live (TTL), Windows Size, “Don’t Fragment” flag, and TCP options (values and order). For example, if a device has an outgoing packet’s IP header with the “Don’t fragment” flag set, TTL with the value of 64, Windows size of 65535, and a specific set of TCP options (02, 01, 03, 01, 01, 08, 04, 00), that’s enough to identify it as running MacOS.
HTTP unveils valuable insights
Within the application layer, numerous protocols offer insights into identifying the OS type of a device, including its precise version or distribution. However, it’s important to note that in some instances, these fields may be customisable by the user, making them less dependable for accurate identification.
One of the most prevalent application protocols employed in the art of OS fingerprinting is HTTP, and it leverages the User-Agent field within the HTTP header. This field, appended by the application (e.g., a web browser), typically contains information about the application itself, the operating system it’s running on, and the underlying device. For instance, an HTTP request sent to a server might feature a User-Agent field that discloses the client as a Firefox browser operating on a Windows 7 system, manifested as: “Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0.”
Nevertheless, the User-Agent field isn’t the sole source of information within the HTTP protocol. Most OS come equipped with a built-in connectivity test that automatically executes when the device connects to a public network. As an illustration, consider the Network Connectivity Status Indicator (NCSI), an Internet connection awareness protocol within Microsoft’s Windows operating systems. It initiates a sequence of specifically designed DNS and HTTP requests and responses, which serve as indicators to determine whether the host is situated behind a captive portal or a proxy server.
Unveiling host identities through Dynamic Host Configuration Protocol (DHCP)
Using DHCP for IP assignment on networks provides a rich source for identifying host systems. DHCP, characterised by its four-step process of Discovery, Offer, Request, and Acknowledge (DORA), exposes valuable details for analysis.
For instance, picture a Windows host actively broadcasting DHCP messages across the local network, eagerly awaiting responses from the DHCP server. In this scenario, it emits a distinct vendor class identifier, “MSFT 5.0,” while its parameter request list is adorned with a series of values that align with the typical preferences of Windows-based hosts.
When one factors in the arrangement and sequence of DHCP options in use, these telltale signs amalgamate into a compelling fingerprint, allowing for confident identification of the host as a Windows OS.
Adopting a multi-protocol strategy
Although certain protocols offer superior accuracy compared to others, there isn’t a one-size-fits-all solution for the challenge of OS identification. Each available option presents unique facets of information. Instead of relying solely on a single protocol for fingerprinting, the industry should adopt a multi-protocol strategy. For instance, combine data from the HTTP User-Agent with insights gleaned from lower-level TCP options. This comprehensive approach can enhance the accuracy and reliability of OS identification efforts.
Asaf Fried is a Data Scientist at Cato Networks, with a real passion for cybersecurity. He brings over six years of experience in both academic and industry settings, and his expertise lies in applying cutting-edge machine-learning techniques to address complex cybersecurity challenges.