Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BitTorrent v2 spec: BEP 52 #175

Open
ewhal opened this issue Aug 9, 2017 · 38 comments
Open

Support BitTorrent v2 spec: BEP 52 #175

ewhal opened this issue Aug 9, 2017 · 38 comments

Comments

@ewhal
Copy link

ewhal commented Aug 9, 2017

arvidn/libtorrent#2197
atomashpolskiy/bt#28

@anacrolix
Copy link
Owner

But why?

@christian-roggia
Copy link

Here you will find a good reddit TL;DR about the changes and improvements of The BitTorrent Protocol Specification v2:
https://www.reddit.com/r/programming/comments/6safyl/the_bittorrent_protocol_specification_v2/

For more detailed information read the bittorrent.org BEP:
http://bittorrent.org/beps/bep_0052.html

The most important improvements are:

  • Introduction of SHA256 and globally unique hashes that will prevent hash collisions and therefore all known hash collision attacks
  • Merkle tree for hashing pieces instead of chunk hashes

Not really sure this is currently really needed, but it is definitely something soon or later should be fully implemented (with the help of the community).

@anacrolix anacrolix changed the title Adopt bittorrent 2 spec Support BitTorrent v2 spec: BEP 52 Sep 8, 2020
@anacrolix
Copy link
Owner

A useful link from a related issue: https://blog.libtorrent.org/2020/09/bittorrent-v2/.

@ghost
Copy link

ghost commented May 26, 2021

@anacrolix when will you support this?

@anacrolix
Copy link
Owner

Not until I see some uptake in the wild. Currently it appears to be zero.

@KyleSanderson
Copy link

Any update on this one? This is absolutely "in the wild".

@anacrolix
Copy link
Owner

I am considering this, there's been an uptick in the community, I think usage might start to appear.

Is there any interest in funding this support?

@KyleSanderson
Copy link

I am considering this, there's been an uptick in the community, I think usage might start to appear.

Is there any interest in funding this support?

Specifically I have a PR into a project that already uses this library. autobrr/autobrr#491

This breaks a handful of sites by parsing the contents as the clients well support V2, but the library does not. As V2 has become widely accepted and mainstream over the last year, this is only going to become more prevalent as time progresses.

So, similar situation to you, as a user there's pain. I have an old parser for v1 I can dust off, but then the maintenance burden shifts.

@anacrolix anacrolix pinned this issue Oct 17, 2022
@kovalensky
Copy link

When? Tell me honey when? I'm just to hungry for this, give me, give me that juicy BitTorrent v2 support, make it faster, deeper in implementation details, make it swim inside my web server, take all of my resources, make it bigger.

But seriously though, 6 years have passed, everyone had put their heads in a hole in the ground and saying that they need to see other ones to try it first, but the other ones think say same and as a result users are losing a great feature.

@fiatjaf
Copy link

fiatjaf commented Feb 9, 2023

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

@KyleSanderson
Copy link

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

That's not how the world works.

Transmission v4 just went GA with BT2 support, which means that every major client has full support for this now.

@fiatjaf
Copy link

fiatjaf commented Feb 9, 2023

Thank you for teaching me how the world works with your 6-year protocol update. Since you know how the world works, tell me for many years will the BT1 protocol still be supported alongside with BT2 because some people and apps won't switch.

@anacrolix
Copy link
Owner

I've not said I won't do it, but time is money and I have a lot to do. BEP 52 messes with the most fundamental part of BitTorrent so a transition will always be difficult. BEP 52 isn't really worthwhile for the most traditional use case for BitTorrent, downloading everything up front. The story is very different for indexers, search, and ephemeral use of BitTorrent for example in hosting websites, streaming etc. Again, I can accelerate this with some support, and/or can always take well considered PRs.

@kovalensky
Copy link

kovalensky commented Feb 11, 2023

How about not doing breaking changes in open protocols next? They should just delete BEP 52 and forget it has ever existed.

That's not how the world works.

Transmission v4 just went GA with BT2 support, which means that every major client has full support for this now.

This is great news, but actually current main task is not the implementation of protocol itself, but more like implementation of its promised features as cross swarm seeding with clients having hash db in memory. So if clients typically searching dht could ask other clients if they have files with This root hash, and if requested clients did in their hash db, they would seed this file, reviving dead torrents, increasing connectivity and decentralization.
Other feature was deduplication of files like swarm merging by reducing their size on disk storing relativities in hash db.

It's just as most open-source projects there are more great features than man power to do this.

@balupton
Copy link

Could those with the ability to implement this propose a budget that they would need? Perhaps it can be crowd-funded.

@anacrolix
Copy link
Owner

It would take me about 2-3 weeks but could extend a bit more. It would likely spill over into anacrolix/dht a bit, require some special flags for bencoding, probably some refactoring of storage. There might also be some refactoring of peer connections, to handle the different hash/info requests. There would need to be a fair few tests. Maybe $3k?

@kovalensky
Copy link

kovalensky commented Feb 14, 2023

Could those with the ability to implement this propose a budget that they would need? Perhaps it can be crowd-funded.

I don't mean to offend @anacrolix, but the current situation is that we need this to be implemented in something popular to get others' attention to do it.
I wrote to the libtorrent maintainer, his library is used in the second most popular open source qBittorrent, Deluge, and he has a lot of BEPs co-written, he said he would help PR, but currently his limiting factor is time, and he is rewriting caching mechanisms, so even bounty is not the case therefore, I am looking for a C++, Boost developer, to discuss a further task and start crowdfunding, I'll keep this topic updated in case.

@anacrolix
Copy link
Owner

Maybe I've missed something, but this GitHub issue tracks BitTorrent v2 support in anacrolix/torrent, which is written in Go. Completion of this issue will be when anacrolix/torrent implements BEP 52.

@izissise
Copy link

izissise commented Nov 5, 2023

Hello, is there any news on this?

@anacrolix
Copy link
Owner

This is blocked on funding currently. The details are in an earlier comment.

@anacrolix
Copy link
Owner

I'm pretty keen to do this but I can't justify working on it.

@kovalensky
Copy link

kovalensky commented Nov 17, 2023

I'm pretty keen to do this but I can't justify working on it.

IMG_20231117_235300_431

@cowtoolz
Copy link

cowtoolz commented Dec 24, 2023

Would be happy to help implementing this as I've been given a worthwhile amount of grant money to explore BEP 52 in 2024.

Edit: please contact me after the holidays. Happy 2024, everyone—hopefully we can get cracking on this soon.

@anacrolix
Copy link
Owner

@weebney what's your interest in supporting this development? I can't find any PM details, please use mine if there's a private aspect to your contribution.

I have a sponsor for this feature, I intend to start work on it soon. Additional sponsorship and help is welcome.

I've also updated the IssueHunt details, #138 (Webtorrent support) was previously funded in a similar way.

@cowtoolz
Copy link

cowtoolz commented Jan 4, 2024

If you've got this covered, I'm likely to put my effort into other projects that need attention. Let me know if you need any help though—would be happy to contribute here on a personal basis.

@anacrolix anacrolix self-assigned this Jan 13, 2024
@anacrolix
Copy link
Owner

anacrolix commented Feb 26, 2024

Development on this has begun at https://github.com/anacrolix/torrent/tree/bittorrent-v2.

Some interesting things I've discovered since working on this:

Other major clients (Transmission 4, qBittorrent 4.4+) claim to have BitTorrent v2 support. It does not work in my testing. Both of those clients can consume hybrid torrents. They do not work with exclusively v2 torrents. Neither support creating hybrid or v2 torrents either (which is weird because qBittorrent claims to support it but I definitely don't see it). I'm using Transmission 4.0.5 and qBittorrent 4.6.3.

There's a lot of confusion about needing "special" support by trackers/DHT etc. My reading is that's all nonsense. You could add support in trackers to automatically combine swarms but it would not be trivial. If the client is aware of multiple swarms it can do this for both DHT and trackers. I have numerous downstream projects that rely on a single infohash and I believe there's a simple way to migrate this forward.

The file piece alignment sounds great, but it is a total nuisance trying to rejig 10+ years of structuring everything around pieces rather than files. It's likely still a net win. There's a lot of concern about it being inefficient for lots of small files. I read that as files smaller than the piece length, but I think actually it's only files that are smaller than the block size, 16 KiB, which is much more palatable.

@kovalensky
Copy link

kovalensky commented Feb 27, 2024

which is weird because qBittorrent claims to support it but I definitely don't see it

BitTorrent v2 added to libtorrent in 2.* branch. When downloading qBittorrent from Fosshub, you have to choose "lt20, qt6" version.
They said they will drop lt12 in 5.0.0 as well as Windows 7 support.

There's a lot of confusion about needing "special" support by trackers/DHT etc.

One wave of confusion started when people didn't find a way to express themselves in term of trackers (public & private & public with rating). For reference, Rutracker doesn't allow hybrids, because of paddings in file list (fix for this is two lines of code btw) and double announces.

Neither it supports v2 only torrents because there should be specific code for this case. They are based on an old version of the TorrentPier engine which is the first engine supporting BitTorrent v2 (stats, file hashes display, etc..), but they will never update due to the custom modifications they made.

In common, discreditization of protocol started when private & public with rating trackers' developers due to lazyness for implementation (can relate though, it took me two weeks of debugging just to re-implement v2 compatible announcer with stats) started to come up with counter arguments and it accelerated with exxagerations from users completely unfamiliar with the protocol.

@anacrolix
Copy link
Owner

BitTorrent v2 added to libtorrent in 2.* branch. When downloading qBittorrent from Fosshub, you have to choose "lt20, qt6" version. They said they will drop lt12 in 5.0.0 as well as Windows 7 support.

Thanks! Very helpful. I see the lt20 version you mentioned, it's also available in homebrew.

One wave of confusion started when people didn't find a way to express themselves in term of trackers (public & private & public with rating). For reference, Rutracker doesn't allow hybrids, because of paddings in file list (fix for this is two lines of code btw) and double announces.

Yeah it's not been trivial to fix up the assumption that files are packed. It's also harder because v2 hashes unpadded pieces, where as v1 hashes including the padding files. So pieces essentially have different length depending on whether a torrent is v1 or v2.

In common, discreditization of protocol started when private & public with rating trackers' developers due to lazyness for implementation (can relate though, it took me two weeks of debugging just to re-implement v2 compatible announcer with stats) started to come up with counter arguments and it accelerated with exxagerations from users completely unfamiliar with the protocol.

That's handy to know. I have gotten it working, but there are lots of corner cases I expect to need to smooth over. It's complex enough that I'm starting to think Go is not sufficient for this, it would have been much easier to port a client in Rust.

The branch above can now download both hybrid and pure v2 torrents.

go install github.com/anacrolix/torrent/cmd/torrent@bittorrent-v2
wget https://libtorrent.org/bittorrent-v2-test.torrent
torrent download bittorrent-v2-test.torrent

Similarly you should be able to download the hybrid torrent at https://blog.libtorrent.org/2020/09/bittorrent-v2/, or hybrid torrents that are available elsewhere (they're much easier to find than pure v2 torrents).

It's very early days, and there will definitely be crashes and bugs. I also expect that pure v2 magnet links won't work yet.

@anacrolix
Copy link
Owner

Forgot to push the updated branch https://github.com/anacrolix/torrent/tree/bittorrent-v2. It's now pushed.

@anacrolix
Copy link
Owner

Support is now in master. Please try it out. It's not complete, but hybrid and pure v2 torrents should now be supported. There's missing support in some tooling. There's no hybrid or v2 torrent creator. Some of the storage backends may do the wrong thing. There's a few shortcuts taken in the protocol for now, they should improve over time.

@cowtoolz
Copy link

cowtoolz commented Mar 9, 2024

Thanks @anacrolix
Let me know if you need any support; I'm back to "real" work soon and should still have some time to put towards a more complete implementation

@anacrolix
Copy link
Owner

anacrolix commented Mar 9, 2024

If you have a downstream project, update to master and let me know if anything isn't working.

A few good hybrid and pure v2 torrents would be useful as test cases too. I have pinched a few from my DHT indexer which I've updated with v2 support, but there aren't many popular pure v2 torrents in the wild.

It would also be nice to have someone go over and/or test the dual-swarm support: Check that announcing are working on both v1 and v2 simultaneously for trackers and DHT.

I've done zero testing of BitTorrent v2 with WebRTC. There's very little extra needed to link it in if at all. I have no idea what the state of v2 support is in webtorrent.

Not yet supported:

  • Replying to hash request
  • Handling hash reject
  • Asking for proof layers in outbound hash request
  • Handling BEP 52's piece reject independently of BEP 6
  • Creating hybrid and v2 torrents
  • Handling infohash upgrades during handshake

Some cool things I'd like to have:

  • Provide a torrent upgrader (take a storage as an input, produce a hybrid or v2 torrent that's as identical as possible)

@anacrolix
Copy link
Owner

Support for BitTorrentv2 was included in v1.56.0, thanks to a generous sponsor. The above features aren't yet implemented, but aren't necessary for compatibility with the network. I'm open to more sponsorship to complete some of those.

@balupton
Copy link

balupton commented Jul 6, 2024

Posting my particular ideal use case here for bittorrent v2

I have various libre licensed folders compromised of stuff from archive.org and whatnot that I've accumulated over the years. Most of the stuff would definitely be valuable to seeding, however bittorrent v1 means I have to create torrent for each specific and correct folder and file structure within. My goal would be to just list the directory as available via bittorrent v2, and it then seeds everything to others; which because hashing with bittorrent v2 is individual files, this is possible. This media library directory will remain mutable on my file system, and the BitTorrent v2 client will make sure the bittorrent v2 listing is up to date. The goal isn't so much publicly listing the entire media library, but to give back and continue seeding the contents of the library to those already requesting any such files that I may have; while my actual filesystem remains mutable.

The specific env for this will be a raspberry pi 5 machine, probably running https://github.com/varbhat/exatorrent for new content acquisition /cc varbhat/exatorrent#406

@kovalensky
Copy link

kovalensky commented Jul 6, 2024

@balupton Oh mate, if you knew how situation is dire in this direction.

This is a long standing wish of BitTorrent users, not implemented anywhere yet, many dead smarms could be alive today with this.

v2 allows it, but one client implementing this wouldn't be enough. There should be a BEP, or at least a de-facto standard, like the one in IPFS (Gossips protocol I assume), so other devs could seamlessly integrate it.

@izissise
Copy link

izissise commented Jul 6, 2024

I have a similar use case to write a plugin for nbdkit that would keep in only the currently used FS blocks in RAM.

I have a proof of concept here https://github.com/Wuageorg/nbdkit/blob/t0rrent/plugins/golang/examples/t0rrent/t0rrent.go

@anacrolix
Copy link
Owner

Thanks for sharing this balupton. This is actually a common misunderstanding of what BitTorrent v2 brings. It does improve the hashing system so that such an implementation would be easier and more efficient, but the real blocker is that BitTorrent partitions swarms on infohashes. This I think was originally just a natural choice as torrents were just a single file, but now it's more like a happy coincidence that provides a significant source of performance in BitTorrent vs for example IPFS.

If you announce content at a granularity smaller than the torrent, you create an enormous amount of overhead to maintain your availability on the network for content. The trade off is in your block size. The current "block size" is essentially the entire torrent. You can create extra layers, for example gossiping and taking advantage of the fact that peers that share common data are likely to share more related data.

BitTorrent v2 significantly improved the situation where you have multiple torrents (explicitly added) with overlapping data. The main reason for this is that piece size is not a factor in hashes, and there's a strict guarantee that files do not share blocks (this was supported in V1 but not universally implemented). A client can implement storage that intentionally takes advantage of this (anacrolix/torrent is a client that has been designed for this).

You should also be aware of BEP46 (again anacrolix/* supports this and has used it in production). If you can handle a single publisher you can evolve torrents over time, and with a client with smart storage you can efficiently map data to newer versions of torrents.

Another solution is just to keep adding newer versions of torrents that you want to support, such as from a feed, and use the above smart torrent storage. Your client will just make itself available in all the related swarms (you need to know what those all are).

It's my observation that the "all content should be addressable" thing comes up frequently and leads to long forays away from what BitTorrent excels at, and how at least its model of decentralised data works well for: The resilience of data is proportional to its popularity. If you can't find a popular "bookmark" to the content, the associated data will die. The end result is very long lived, high quality content, and short lived, very popular content.

I've operated a system that scales in a similar fashion to archive.org. I think the point I'm trying to make is that using BitTorrent for archive storage is an extremely long tail feature, where it helps when there's short bursts in popularity, but you can see they mostly rely on webseeding and centralisation to locate the torrents and bootstrap the data.

The opposite possibility is a system where you can do a decentralized search on a DHT and get back merkle tree pointers to the files you want. Then you could look up the trees and peers holding all the data on demand. I don't think anyone has solved that yet. I'd be willing to try (and I've tackled parts of it in the past), but I imagine I'd end up with another tyre to throw on the fire with IPFS and all its lesser known alternatives.

@anacrolix
Copy link
Owner

Re the last comment I think we could start a productive separate discussion to go over any interest in implementing the mentioned concerns on BitTorrent, including any existing attempts and research that exist. Feel free to create this discussion in anacrolix/torrent.

An update on these:

Not yet supported:

  • Replying to hash request
  • Handling hash reject
  • Asking for proof layers in outbound hash request
  • Handling BEP 52's piece reject independently of BEP 6
  • Creating hybrid and v2 torrents

I've realised this is probably the most important. It creates forward compatibility for all torrents made by this implementation.

  • Handling infohash upgrades during handshake

I think I might have already done this, I need to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants