Abstract Spider

The core functionality of PATHspider is implemented in two classes: pathspider.base.SyncronisedSpider and pathspider.base.DesynchronisedSpider. These both inherit from the base pathspider.base.Spider which provides a skeleton that has the required functions for any plugin. The documentation for this base class is below:

pathspider.base

Basic framework for Pathspider: coordinate active measurements on large target lists with both system-level network stack state (sysctls, iptables rules, etc) as well as information derived from flow-level passive observation of traffic at the sender.

Derived and generalized from ECN Spider (c) 2014 Damiano Boppart <hat.guy.repo@gmail.com>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

class pathspider.base.Conn[source]
class pathspider.base.Connection(client, port, state, tstart)
client

Alias for field number 0

port

Alias for field number 1

state

Alias for field number 2

tstart

Alias for field number 3

class pathspider.base.DesynchronizedSpider(worker_count, libtrace_uri, args)[source]
config_one()[source]
config_zero()[source]
configurator()[source]

Since there is no need for a configurator thread in a desynchronized spider, this thread is a no-op

worker(worker_number)[source]

This function provides the logic for configuration-synchronized worker threads.

Parameters:worker_number (int) – The unique number of the worker.

The workers operate as continuous loops:

  • Fetch next job from the job queue
  • Perform pre-connection operations
  • Acquire a lock for “config_zero”
  • Perform the “config_zero” connection
  • Release “config_zero”
  • Acquire a lock for “config_one”
  • Perform the “config_one” connection
  • Release “config_one”
  • Perform post-connection operations for config_zero and pass the result to the merger
  • Perform post-connection operations for config_one and pass the result to the merger
  • Do it all again

If the job fetched is the SHUTDOWN_SENTINEL, then the worker will terminate as this indicates that all the jobs have now been processed.

class pathspider.base.PluggableSpider[source]
static register_args(subparsers)[source]
class pathspider.base.SemaphoreN(value)[source]

An extension to the standard library’s BoundedSemaphore that provides functions to handle n tokens at once.

acquire_n(value=1, blocking=True, timeout=None)[source]

Acquire value number of tokens at once.

The parameters blocking and timeout have the same semantics as BoundedSemaphore.

Returns:The same value as the last call to BoundedSemaphore‘s

acquire() if acquire() were called value times instead of the call to this method.

empty()[source]

Acquire all tokens of the semaphore.

release_n(value=1)[source]

Release value number of tokens at once.

Returns:The same value as the last call to BoundedSemaphore‘s

release() if release() were called value times instead of the call to this method.

class pathspider.base.Spider(worker_count, libtrace_uri, args)[source]

A spider consists of a configurator (which alternates between two system configurations), a large number of workers (for performing some network action for each configuration), an Observer which derives information from passively observed traffic, and a thread that merges results from the workers with flow records from the collector.

add_job(job)[source]

Adds a job to the job queue.

If PATHspider is currently stopping, the job will not be added to the queue.

config_one()[source]

Changes the global state or system configuration for the experimental measurements.

config_zero()[source]

Changes the global state or system configuration for the baseline measurements.

configurator()[source]
connect(job, pcs, config)[source]

Performs the connection.

Parameters:
  • job (dict) – The job record.
  • pcs (dict) – The result of the pre-connection operations(s).
  • config (int) – The current state of the configurator (0 or 1).
Returns:

object – Any result of the connect operation to be passed to pathspider.base.Spider.post_connect().

The connect function is used to perform the connection operation and is run for both the A and B test. This method is not implemented in the abstract pathspider.base.Spider class and must be implemented by any plugin.

Sockets created during this operation can be returned by the function for use in the post-connection phase, to minimise the time that the configurator is blocked from moving to the next configuration.

create_observer()[source]

Create a flow observer.

This function is called by the base Spider logic to get an instance of pathspider.observer.Observer configured with the function chains that are requried by the plugin.

This method is not implemented in the abstract pathspider.base.Spider class and must be implemented by any plugin.

For more information on how to use the flow observer, see Observer.

exception_wrapper(target, *args, **kwargs)[source]
merge(flow, res)[source]

Merge a job record with a flow record.

Parameters:
  • flow (dict) – The flow record.
  • res (dict) – The job record.
Returns:

tuple – Final record for job.

In order to create a final record for reporting on a job, the final job record must be merged with the flow record. This function should be implemented by any plugin to provide the logic for this merge as the keys used in these records cannot be known by PATHspider in advance.

This method is not implemented in the abstract pathspider.base.Spider class and must be implemented by any plugin.

merger()[source]

Thread to merge results from the workers and the observer.

post_connect(job, conn, pcs, config)[source]

Performs post-connection operations.

Parameters:
  • job (dict) – The job record.
  • conn (object) – The result of the connection operation(s).
  • pcs (dict) – The result of the pre-connection operations(s).
  • config (int) – The state of the configurator during pathspider.base.Spider.connect().
Returns:

dict – Result of the pre-connection operation(s).

The post_connect function can be used to perform any operations that must be performed after each connection. It will be run for both the A and the B configuration, and is not synchronised with the configurator.

Plugins to PATHspider can optionally implement this function. If this function is not overloaded, it will be a noop.

Any sockets or other file handles that were opened during pathspider.base.Spider.connect() should be closed in this function if they have not been already.

pre_connect(job)[source]

Performs pre-connection operations.

Parameters:job (dict) – The job record.
Returns:dict – Result of the pre-connection operation(s).

The pre_connect function can be used to perform any operations that must be performed before each connection. It will be run only once per job, with the same result passed to both the A and B connect calls. This function is not synchronised with the configurator.

Plugins to PATHspider can optionally implement this function. If this function is not overloaded, it will be a noop.

shutdown()[source]

Shut down PathSpider in an orderly fashion, ensuring that all queued jobs complete, and all available results are merged.

start()[source]

This function starts a PATHspider plugin.

In order to run, the plugin must have first been activated by calling its activate() method. This function causes the following to happen:

  • Set the running flag
  • Create an pathspider.observer.Observer and start its process
  • Start the merger thread
  • Start the configurator thread
  • Start the worker threads

The number of worker threads to start was given when activating the plugin.

terminate()[source]

Shut down PathSpider as quickly as possible, without any regard to completeness of results.

worker()[source]
class pathspider.base.SynchronizedSpider(worker_count, libtrace_uri, args)[source]
configurator()[source]

Thread which synchronizes on a set of semaphores and alternates between two system states.

tcp_connect(job)[source]

This helper function will perform a TCP connection. It will not perform any special action in the event that this is the experimental flow, it only performs a TCP connection. This function expects that self.conn_timeout has been set to a sensible value.

worker(worker_number)[source]

This function provides the logic for configuration-synchronized worker threads.

Parameters:worker_number (int) – The unique number of the worker.

The workers operate as continuous loops:

  • Fetch next job from the job queue
  • Perform pre-connection operations
  • Acquire a lock for “config_zero”
  • Perform the “config_zero” connection
  • Release “config_zero”
  • Acquire a lock for “config_one”
  • Perform the “config_one” connection
  • Release “config_one”
  • Perform post-connection operations for config_zero and pass the result to the merger
  • Perform post-connection operations for config_one and pass the result to the merger
  • Do it all again

If the job fetched is the SHUTDOWN_SENTINEL, then the worker will terminate as this indicates that all the jobs have now been processed.