Abstract Spider¶
The core functionality of PATHspider is implemented in two classes:
pathspider.base.SyncronisedSpider
and
pathspider.base.DesynchronisedSpider
. These both inherit from the base
pathspider.base.Spider
which provides a skeleton that has the required
functions for any plugin. The documentation for this base class is below:
pathspider.base¶
Basic framework for Pathspider: coordinate active measurements on large target lists with both system-level network stack state (sysctls, iptables rules, etc) as well as information derived from flow-level passive observation of traffic at the sender.
Derived and generalized from ECN Spider (c) 2014 Damiano Boppart <hat.guy.repo@gmail.com>
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
-
class
pathspider.base.
Connection
(client, port, state, tstart)¶ -
client
¶ Alias for field number 0
-
port
¶ Alias for field number 1
-
state
¶ Alias for field number 2
-
tstart
¶ Alias for field number 3
-
-
class
pathspider.base.
DesynchronizedSpider
(worker_count, libtrace_uri, args)[source]¶ -
-
configurator
()[source]¶ Since there is no need for a configurator thread in a desynchronized spider, this thread is a no-op
-
worker
(worker_number)[source]¶ This function provides the logic for configuration-synchronized worker threads.
Parameters: worker_number (int) – The unique number of the worker. The workers operate as continuous loops:
- Fetch next job from the job queue
- Perform pre-connection operations
- Acquire a lock for “config_zero”
- Perform the “config_zero” connection
- Release “config_zero”
- Acquire a lock for “config_one”
- Perform the “config_one” connection
- Release “config_one”
- Perform post-connection operations for config_zero and pass the result to the merger
- Perform post-connection operations for config_one and pass the result to the merger
- Do it all again
If the job fetched is the SHUTDOWN_SENTINEL, then the worker will terminate as this indicates that all the jobs have now been processed.
-
-
class
pathspider.base.
SemaphoreN
(value)[source]¶ An extension to the standard library’s BoundedSemaphore that provides functions to handle n tokens at once.
-
acquire_n
(value=1, blocking=True, timeout=None)[source]¶ Acquire
value
number of tokens at once.The parameters
blocking
andtimeout
have the same semantics asBoundedSemaphore
.Returns: The same value as the last call to BoundedSemaphore‘s acquire()
ifacquire()
were calledvalue
times instead of the call to this method.
-
-
class
pathspider.base.
Spider
(worker_count, libtrace_uri, args)[source]¶ A spider consists of a configurator (which alternates between two system configurations), a large number of workers (for performing some network action for each configuration), an Observer which derives information from passively observed traffic, and a thread that merges results from the workers with flow records from the collector.
-
add_job
(job)[source]¶ Adds a job to the job queue.
If PATHspider is currently stopping, the job will not be added to the queue.
-
config_one
()[source]¶ Changes the global state or system configuration for the experimental measurements.
-
config_zero
()[source]¶ Changes the global state or system configuration for the baseline measurements.
-
connect
(job, pcs, config)[source]¶ Performs the connection.
Parameters: - job (dict) – The job record.
- pcs (dict) – The result of the pre-connection operations(s).
- config (int) – The current state of the configurator (0 or 1).
Returns: object – Any result of the connect operation to be passed to
pathspider.base.Spider.post_connect()
.The connect function is used to perform the connection operation and is run for both the A and B test. This method is not implemented in the abstract
pathspider.base.Spider
class and must be implemented by any plugin.Sockets created during this operation can be returned by the function for use in the post-connection phase, to minimise the time that the configurator is blocked from moving to the next configuration.
-
create_observer
()[source]¶ Create a flow observer.
This function is called by the base Spider logic to get an instance of
pathspider.observer.Observer
configured with the function chains that are requried by the plugin.This method is not implemented in the abstract
pathspider.base.Spider
class and must be implemented by any plugin.For more information on how to use the flow observer, see Observer.
-
merge
(flow, res)[source]¶ Merge a job record with a flow record.
Parameters: - flow (dict) – The flow record.
- res (dict) – The job record.
Returns: tuple – Final record for job.
In order to create a final record for reporting on a job, the final job record must be merged with the flow record. This function should be implemented by any plugin to provide the logic for this merge as the keys used in these records cannot be known by PATHspider in advance.
This method is not implemented in the abstract
pathspider.base.Spider
class and must be implemented by any plugin.
-
post_connect
(job, conn, pcs, config)[source]¶ Performs post-connection operations.
Parameters: - job (dict) – The job record.
- conn (object) – The result of the connection operation(s).
- pcs (dict) – The result of the pre-connection operations(s).
- config (int) – The state of the configurator during
pathspider.base.Spider.connect()
.
Returns: dict – Result of the pre-connection operation(s).
The post_connect function can be used to perform any operations that must be performed after each connection. It will be run for both the A and the B configuration, and is not synchronised with the configurator.
Plugins to PATHspider can optionally implement this function. If this function is not overloaded, it will be a noop.
Any sockets or other file handles that were opened during
pathspider.base.Spider.connect()
should be closed in this function if they have not been already.
-
pre_connect
(job)[source]¶ Performs pre-connection operations.
Parameters: job (dict) – The job record. Returns: dict – Result of the pre-connection operation(s). The pre_connect function can be used to perform any operations that must be performed before each connection. It will be run only once per job, with the same result passed to both the A and B connect calls. This function is not synchronised with the configurator.
Plugins to PATHspider can optionally implement this function. If this function is not overloaded, it will be a noop.
-
shutdown
()[source]¶ Shut down PathSpider in an orderly fashion, ensuring that all queued jobs complete, and all available results are merged.
-
start
()[source]¶ This function starts a PATHspider plugin.
In order to run, the plugin must have first been activated by calling its
activate()
method. This function causes the following to happen:- Set the running flag
- Create an
pathspider.observer.Observer
and start its process - Start the merger thread
- Start the configurator thread
- Start the worker threads
The number of worker threads to start was given when activating the plugin.
-
-
class
pathspider.base.
SynchronizedSpider
(worker_count, libtrace_uri, args)[source]¶ -
configurator
()[source]¶ Thread which synchronizes on a set of semaphores and alternates between two system states.
-
tcp_connect
(job)[source]¶ This helper function will perform a TCP connection. It will not perform any special action in the event that this is the experimental flow, it only performs a TCP connection. This function expects that self.conn_timeout has been set to a sensible value.
-
worker
(worker_number)[source]¶ This function provides the logic for configuration-synchronized worker threads.
Parameters: worker_number (int) – The unique number of the worker. The workers operate as continuous loops:
- Fetch next job from the job queue
- Perform pre-connection operations
- Acquire a lock for “config_zero”
- Perform the “config_zero” connection
- Release “config_zero”
- Acquire a lock for “config_one”
- Perform the “config_one” connection
- Release “config_one”
- Perform post-connection operations for config_zero and pass the result to the merger
- Perform post-connection operations for config_one and pass the result to the merger
- Do it all again
If the job fetched is the SHUTDOWN_SENTINEL, then the worker will terminate as this indicates that all the jobs have now been processed.
-