Usage Overview¶
You can run PATHspider from the command line. In order for the Observer to
work, you will need permissions to capture raw packets from the network
interface. This will require you to use sudo
or equivalent in order to
run PATHspider.
# pathspider --help
usage: pathspider [-h] [-s] [-i INTERFACE] [-w WORKERS] [--input INPUTFILE]
[--output OUTPUTFILE] [-v]
PLUGIN ...
Pathspider will spider the paths.
optional arguments:
-h, --help show this help message and exit
-s, --standalone run in standalone mode. this is the default mode (and
currently the only supported mode). in the future,
mplane will be supported as a mode of operation.
-i INTERFACE, --interface INTERFACE
the interface to use for the observer
-w WORKERS, --workers WORKERS
number of workers to use
--input INPUTFILE a file containing a list of remote hosts to test, with
any accompanying metadata expected by the pathspider
test. this file should be formatted as a comma-
seperated values file. Defaults to standard input.
--output OUTPUTFILE the file to output results data to. Defaults to
standard output.
-v, --verbose log debug-level output.
Plugins:
The following plugins are available for use:
dscp DiffServ Codepoints
tls Transport Layer Security
tfo TCP Fast Open
ecn Explicit Congestion Notification
dnsresolv DNS resolution for hostnames to IPv4 and v6 addresses
Spider safely!
Quickstart Example¶
You can run a small study using the ECN plugin and the included
webinput.csv
file to measure path transparency to ECN for a small selection
of web servers and save the results in results.txt
:
pathspider -i eth0 ecn </usr/share/doc/pathspider/examples/webinput.csv >results.txt
Note
If you’ve not installed PATHspider from apt, you will find the webinput.csv example script in the examples folder of the source distribution.
Data Formats¶
PATHspider uses newline delimited JSON (ndjson) for the output format. At present, the input format is CSV although in future versions we will deprecate the CSV input format and use a ndjson format input to unify the data formats. The ndjson format gives flexibility in the actual contents of the data as different tests may require data to remain associated with jobs, for example the Alexa ranking of a webserver, so that it can be present in the final output, or in some cases the data may be used as part of the test, for example when running tests against authoritative DNS servers and needing to know a domain for which the server should be authoritative.
Job List¶
The standalone runner expects a CSV file as input, with one line per job. The format for each line should be as follows:
target_ip,target_port,target_hostname,target_rank
The current input format is optimised for the use case of using the Alexa top
1 million webservers and so includes a value for the ranking in that list for
the job. This value is opaque to PATHspider and may be set to any string
desirable, or to 0
if this is not required.
If the target_port
is not a valid integer, the job will be skipped and a
warning emitted by the logger. Blank lines are permitted and will be ignored by
the job feeder.
Output Format¶
PATHspider’s output is in the form of two records per job, as JSON dicts. One record will be for the baseline (A) connection, and one for the experimental (B) connection. These JSON records contain the original job information, any information added by the connection functions and any information added by the Observer.
The connection logic of all the plugins that ship with the PATHspider
distribution will set a config
value, either 0 or 1 (with 0 being baseline,
1 being experimental) to distinguish flows. Due to the highly parallel nature
of PATHspider, the two flows for a particular job may not be output together
and may have other flows between them. Any analysis tools will need to take
this into consideration.
The plugins that ship with the PATHspider distribution will also have the following values set in their output:
Key | Description |
---|---|
config | 0 for baseline, 1 for experimental |
connstate | True if the connection was successful, False if the connection failed (e.g. due to timeout). |
dip | Layer 3 (IPv4/IPv6) source address |
sp | Layer 4 (TCP/UDP) source port |
dp | Layer 4 (TCP/UDP) destination port |
pkt_fwd | A count of the number of packets seen in the forward direction |
pkt_rev | A count of the number of packets seen in the reverse direction |
oct_fwd | A count of the number of octets seen in the forward direction |
oct_rev | A count of the number of octets seen in the reverse direction |
For detail on the values in individual plugins, see the section for that plugin later in this documentation.