Developing Plugins

PATHspider is written to be extensible and the plugins that are included in the PATHspider distribution are examples of the measurements that PATHspider can perform.

pathspider.plugins is a namespace package. Namespace packages are a mechanism for splitting a single Python package across multiple directories on disk. One or more distributions may provide modules which exist inside the same namespace package. The PATHspider distribution’s plugins are installed in pathspider.plugins, but also 3rd-party plugins can exist in this path without being a part of the PATHspider distribution.

Quickstart

The directory layout and example plugin below can be found in the pathspider-example GitHub repository. You can get going quickly by forking this repository and using that as a basis for plugin development.

Directory Layout

To get started you will need to create the required directory layout for PATHspider plugins, in this case for the Example plugin:

pathspider-example
└── pathspider
    ├── __init__.py
    └── plugins
        ├── __init__.py
        └── example.py

Inside both __init__.py files, you will need to add the following (and only the following):

from pkgutil import extend_path
__path__ = extend_path(__path__, __name__)

Your plugin will be written in example.py and this plugin will be discovered automatically when you run PATHspider.

Example Plugin

The following code can be found in the quickstart example as a starting point for developing your plugin. If you are not using the quickstart example, you may copy and paste this code into a Python file under pathspider/plugins/ in the directory structure. This example is explained in the following sections.

import sys
import collections
import logging

from pathspider.base import SynchronizedSpider
from pathspider.base import PluggableSpider
from pathspider.base import NO_FLOW

from pathspider.observer import simple_observer

Connection = collections.namedtuple("Connection", ["host", "state"])
SpiderRecord = collections.namedtuple("SpiderRecord", ["ip", "rport", "port",
                                                       "host", "config",
                                                       "connstate"])

class Example(SynchronizedSpider, PluggableSpider):

    """
    An example PATHspider plugin.
    """

    def config_zero(self):
        logger = logging.getLogger("example")
        logger.debug("Configuration zero")

    def config_one(self):
        logger = logging.getLogger("example")
        logger.debug("Configuration one")

    def connect(self, job, pcs, config):
        sock = tcp_connect(job)
        return Connection(sock, 1)

    def post_connect(self, job, conn, pcs, config):
        rec = SpiderRecord(job[0], job[1], job[2], config, True)
        return rec

    def create_observer(self):
        logger = logging.getLogger("example")
        try:
            return simple_observer()
        except:
            logger.error("Observer would not start")
            sys.exit(-1)

    def merge(self, flow, res):
        if flow == NO_FLOW:
            flow = {"dip": res.ip,
                    "sp": res.port,
                    "dp": res.rport,
                    "observed": False}
        else:
            flow['observed'] = True

        self.outqueue.put(flow)

    @staticmethod
    def register_args(subparsers):
        parser = subparsers.add_parser('example', help="Example starting point for development")
        parser.set_defaults(spider=Example)

You will need to provide implementations for each of these functions, which are explained next. We’ll start with the connection logic.

Connection Logic

Configurator

These functions perform global changes that may be required between performing the baseline (A) and the experimental (B) configurations. The changes may be a call to sysctl, changes via netfilter or a call to a robot arm to reposition the satellite array. In the event that global state changes are not required, these can be implemented as no-ops.

An example implementation of these methods can be found in the ECN plugin:

ECN.config_zero()[source]

Disables ECN negotiation via sysctl.

ECN.config_one()[source]

Enables ECN negotiation via sysctl.

(Pre-,Post-) Connection

The pre-connection function will run only once, and the result of the pre-connection operation will be available to both runs of the connection and post-connection functions.

If you require to pass different values depending on the configuration, you can perform two operations in the pre-connect function, returning a tuple, and selecting the value to use based on the configuration in the later functions.

An example implementation of these methods can be found in the ECN plugin:

ECN.connect(job, pcs, config)[source]

Performs a TCP connection.

ECN.post_connect(job, conn, pcs, config)[source]

Close the socket gracefully.

Observer Functions

PATHspider’s observer will accept functions and pass python-libtrace dissected packets along with the associated flow record to them for every packet recieved.

The pathspider.observer module provides pathspider.observer.simple_observer() which allows the creation of a very simple Observer during development of the other portions of the plugin. There are two simple examples of observer functions that are used in the observer created by this function.

When you are ready to start working with your own Observer functions, you will need to expand your create_observer() function. You can use the following example:

from pathspider.observer import Observer
from pathspider.observer import basic_flow
from pathspider.observer import basic_count

class Example(SynchronizedSpider, PluggableSpider):

    [...]

    def create_observer(self):
        logger = logging.getLogger("example")
        try:
            return Observer(self.libtrace_uri,
                            new_flow_chain=[basic_flow],
                            ip4_chain=[basic_count],
                            ip6_chain=[basic_count])
        except:
            logger.error("Observer would not start")
            sys.exit(-1)

Depending on the types of analysis you would like to do on the packets, you should pass your functions to the appropriate chain:

Function Chain Description
new_flow_chain Functions to initialise fields in the flow record for new flows.
ip4_chain Functions to record details from IPv4 headers.
ip6_chain Functions to record details from IPv6 headers.
tcp_chain Functions to record details from TCP headers.
udp_chain Functions to record details from UDP headers.
l4_chain Functions to record details from other layer 4 headers.

Library Observer Functions

The pathspider.observer.basic_flow() function simply creates the inital state for the flow record, extracting the 5-tuple and initialising counters. The counters are used by the pathspider.observer.basic_count() function that counts the number of packets and octets seen in each direction. These combined will allow your plugin to produce the default output fields.

PATHspider also provides library observer functions for some protocols:

Writing Observer Functions

When you are ready to write functions for the observer, first identify which data should be stored in the flow record. This is a dict that is made available for every call to an observer function for a particular flow and not shared across flows. Once the flow is completed, this is the record that will be returned to the merger.

The flow record should be initialised when a new flow has been identified. The functions in the new_flow_chain are called, in sequence, when a new flow is identified by the Observer. These functions are passed two arguments: rec - the empty flow record, and ip - the IP header.

You should familiarise yourself with the python-libtrace documentation. The analysis functions all follow the same function prototype with rec - the empty flow record, x - the header, and rev - boolean value indicating the direction the packet travelled (i.e. Was the packet in the reverse direction?).

The only difference in these functions is the header that is passed, as a python-libtrace object, to the function. The same flow record is always passed for each call for the same flow, regardless of which function chain the function is in.

If a function returns False, as it has identified the end of the flow, the Observer will consider the flow to be finished and will pass it to be merged with the job record after a short delay. This might occur for TCP flows when both FIN packets have been seen using the pathspider.observer.tcp.tcp_complete() function.

Merging

The merge function will be called for every job and given the job record and the observer record. The merge function is then to return the final record to be recorded in the dataset for the measurement run.

Warning

It is possible for the Observer to return a NO_FLOW object in some circumstances, where the flow has not been observed. Any implementation must handle this gracefully.

An example implementation of this method can be found in the ECN plugin:

ECN.merge(flow, res)[source]

Merge flow records.

Includes the configuration and connection success or failure of the socket connection with the flow record.

Running Your Plugin

In order to run your plugin, in the root of your plugin source tree run:

PYTHONPATH=. pathspider example </usr/share/doc/pathspider/examples.csv >results.txt

Unless you install your plugin, you will need to add the plugin tree to the PYTHONPATH to allow the plugin to be discovered.