Usage Overview ============== You can run PATHspider from the command line. In order for the Observer to work, you will need permissions to capture raw packets from the network interface. This will require you to use ``sudo`` or equivalent in order to run PATHspider. .. code-block:: text # pathspider --help usage: pathspider [-h] [-s] [-i INTERFACE] [-w WORKERS] [--input INPUTFILE] [--output OUTPUTFILE] [-v] PLUGIN ... Pathspider will spider the paths. optional arguments: -h, --help show this help message and exit -s, --standalone run in standalone mode. this is the default mode (and currently the only supported mode). in the future, mplane will be supported as a mode of operation. -i INTERFACE, --interface INTERFACE the interface to use for the observer -w WORKERS, --workers WORKERS number of workers to use --input INPUTFILE a file containing a list of remote hosts to test, with any accompanying metadata expected by the pathspider test. this file should be formatted as a comma- seperated values file. Defaults to standard input. --output OUTPUTFILE the file to output results data to. Defaults to standard output. -v, --verbose log debug-level output. Plugins: The following plugins are available for use: dscp DiffServ Codepoints tls Transport Layer Security tfo TCP Fast Open ecn Explicit Congestion Notification dnsresolv DNS resolution for hostnames to IPv4 and v6 addresses Spider safely! Quickstart Example ------------------ You can run a small study using the ECN plugin and the included ``webinput.csv`` file to measure path transparency to ECN for a small selection of web servers and save the results in ``results.txt``: .. code-block:: shell pathspider -i eth0 ecn results.txt .. note:: If you've not installed PATHspider from apt, you will find the webinput.csv example script in the examples folder of the source distribution. Data Formats ------------ PATHspider uses `newline delimited JSON `_ (ndjson) for the output format. At present, the input format is CSV although in future versions we will deprecate the CSV input format and use a ndjson format input to unify the data formats. The ndjson format gives flexibility in the actual contents of the data as different tests may require data to remain associated with jobs, for example the Alexa ranking of a webserver, so that it can be present in the final output, or in some cases the data may be used as part of the test, for example when running tests against authoritative DNS servers and needing to know a domain for which the server should be authoritative. Job List ~~~~~~~~ The standalone runner expects a CSV file as input, with one line per job. The format for each line should be as follows:: target_ip,target_port,target_hostname,target_rank The current input format is optimised for the use case of using the Alexa top 1 million webservers and so includes a value for the ranking in that list for the job. This value is opaque to PATHspider and may be set to any string desirable, or to ``0`` if this is not required. If the ``target_port`` is not a valid integer, the job will be skipped and a warning emitted by the logger. Blank lines are permitted and will be ignored by the job feeder. .. _defaultoutput: Output Format ~~~~~~~~~~~~~ PATHspider's output is in the form of two records per job, as JSON dicts. One record will be for the baseline (A) connection, and one for the experimental (B) connection. These JSON records contain the original job information, any information added by the connection functions and any information added by the Observer. The connection logic of all the plugins that ship with the PATHspider distribution will set a ``config`` value, either 0 or 1 (with 0 being baseline, 1 being experimental) to distinguish flows. Due to the highly parallel nature of PATHspider, the two flows for a particular job may not be output together and may have other flows between them. Any analysis tools will need to take this into consideration. The plugins that ship with the PATHspider distribution will also have the following values set in their output: +------------+----------------------------------------------------------------+ | Key | Description | +============+================================================================+ | config | 0 for baseline, 1 for experimental | +------------+----------------------------------------------------------------+ | connstate | True if the connection was successful, False if the connection | | | failed (e.g. due to timeout). | +------------+----------------------------------------------------------------+ | dip | Layer 3 (IPv4/IPv6) source address | +------------+----------------------------------------------------------------+ | sp | Layer 4 (TCP/UDP) source port | +------------+----------------------------------------------------------------+ | dp | Layer 4 (TCP/UDP) destination port | +------------+----------------------------------------------------------------+ | pkt_fwd | A count of the number of packets seen in the forward direction | +------------+----------------------------------------------------------------+ | pkt_rev | A count of the number of packets seen in the reverse direction | +------------+----------------------------------------------------------------+ | oct_fwd | A count of the number of octets seen in the forward direction | +------------+----------------------------------------------------------------+ | oct_rev | A count of the number of octets seen in the reverse direction | +------------+----------------------------------------------------------------+ For detail on the values in individual plugins, see the section for that plugin later in this documentation.