Bittorrent is a well-established protocol and tool for peer-to-peer distribution of files. It is frequently used in large scale infrastructures for distributing content in a highly-efficient and exceptionally fast manner. I decided to write a general-purpose Chef bittorrent cookbook for providing BitTorrent Resources via a Lightweight Resource Provider (LWRP). While there was already a very useful transmission cookbook for downloading torrent files, I wanted to create LWRPs and a set of recipes that made it simple to seed and peer a file with minimal interaction.
Even though there are a tremendous number of BitTorrent applications available, I had 2 requirements: trackerless seeding and the ability to be easily daemonized. After researching quite a few clients, aria2 was found to have the required features and turned out to work quite well. Trackerless torrents proved to be a poorly supported and/or documented feature for most tools, the key to using this with aria2 was to understand the need for seeding node to expose the distributed hash table (DHT) on a single port for ease of use and to include this in the creation of the torrent itself.
In testing and benchmarking with a 4.2 gigabyte file (CentOS 6.2 DVD 1) on EC2 with 11 m1.smalls (1 seed and 10 peers), there were a number of interesting results. The chef-client run averaged about 8 and a half minutes (with download speeds around 11.4MiB/s) for the 10 nodes, this stayed fairly constant when moving to 20 nodes.
A separate test was done distributing the file with Apache as well. Not surprisingly the more nodes that were added, the slower the downloads became to the point where some failed because of timeouts. Apache could probably be configured to handle the scenario better, but this is why we use a peer-to-peer solution to avoid the single source. The average chef-client run was about 20 minutes for 10 nodes, which was twice as slow as the same test with 3 nodes.
There are definite bottlenecks on EC2, either on the filesystem or at the network level which is to be expected in a virtualized environment. File allocations on some machines take an order of magnitude longer than others, and some nodes are extremely slow in networking. With the larger test case of 20 nodes, some were even faster than with 10 nodes, a few outliers were exceptionally slow (as seen with any large sample of EC2 nodes). On my gigabit network with physical nodes, depending on the downloading drive (SSD or RAID-0 drives), I doubled these speeds with just 5 nodes. This would indicate that the drives or filesystems are the bottleneck.
TRACKERLESS TORRENTS WITH DHT
The first use case I wanted to get working was trackerless-torrents. To create the torrent we use the mktorrent package. For trackerless seeding from 10.0.0.10, we used the following command:
mktorrent -d -c \"Generated with Chef\" -a node://10.0.0.10:6881 -o test0.torrent mybigfile.tgz
To run aria2 as a trackerless seeder in the foreground on 10.0.0.10, it is important to identify the DHT and listening ports (UDP and TCP respectively).
aria2c -V --summary-interval=0 --seed-ratio=0.0 --dht-file-path=/tmp/dht.dat --dht-listen-port 6881 --listen-port 6881 --d/tmp/ test0.torrent
To run aria2 as a peer of a trackerless torrent on 10.0.0.10, you have to specify the “–dht-entry-point”.
aria2c -V --summary-interval=0 --seed-ratio=0.0 --dht-file-path=/tmp/dht.dat --dht-listen-port 6881 --listen-port 6881 --dht-entry-point=10.0.0.10:6881 --d/tmp/ test0.torrent
This technique works, but has the limiting factor of needing to transfer the torrent file between machines. This is solved in the bittorrent cookbook by storing the torrent file in a data bag (future versions of the cookbook may support magnet URIs to remove the need for a file completely).
LIGHTWEIGHT RESOURCE PROVIDERS
Once I had identified the commands that worked for these operations, they needed to be encapsulated in a bittorrent cookbook with LWRPs for creating torrents, seeding and peering the files.
bittorrent_torrent: Given a file it creates a .torrent file for sharing a local file or directory via the [BitTorrent protocol](http://en.wikipedia.org/wiki/BitTorrent).
bittorrent_seed: Share a local file via the [BitTorrent protocol](http://en.wikipedia.org/wiki/BitTorrent).
bittorrent_peer: Downloads the file or files specified by a torrent via the [BitTorrent protocol](http://en.wikipedia.org/wiki/BitTorrent). Update notifications are triggered when a blocking download completes and on the initiation of seeding. There are also options on whether to block on the download and whether to continue seeding after download.
The recipes were provided as an easy way to use bittorrent to share and download files simply by passing the path and filename via
['bittorrent']['path'] attributes. There are recipes to seed, peer and to stop the seeding and peering.
bittorrent::seed: given the
['bittorrent']['path'] attributes it will create a .torrent file for the file(s) to be distributed, store it in the `bittorrent` data bag and start seeding the distribution of the file(s).
bittorrent::peer: given the
['bittorrent']['path'] attributes it will look for a torrent in the bittorrent data bag that provides that file. If one exists, it will download the file and continue seeding depending on the value of the
['bittorrent'']['seed'] attribute (false by default).
bittorrent::stop: stops either the seeding or peering of a file.
I plan on continuing development on this cookbook as it gets used in production environments and would appreciate any feedback or patches. Right now it is Ubuntu-only, but adding support for RHEL/CentOS is on the short-term roadmap and requires finding sources for the `mktorrent` and `aria2` packages. Using magnet URIs instead of a torrent file would probably be more efficient as well, since it would remove the distribution of the torrent file and allow the use of search to specify multiple seeders to prime the DHT.