Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Web Crypto API #28

Closed
rugk opened this issue Jul 9, 2016 · 44 comments · Fixed by #431
Closed

Switch to Web Crypto API #28

rugk opened this issue Jul 9, 2016 · 44 comments · Fixed by #431

Comments

@rugk
Copy link
Member

rugk commented Jul 9, 2016

Switch to Web Crypto API and provide sjcl only as a fallback.

Browser support is good.

@rugk rugk added this to the Next release milestone Jul 9, 2016
@rugk rugk mentioned this issue Jul 9, 2016
@rugk rugk removed this from the Next release milestone Jul 14, 2016
@rugk rugk mentioned this issue Jul 18, 2016
3 tasks
@elrido elrido added this to the review & refactor paste format milestone Aug 18, 2016
@rugk
Copy link
Member Author

rugk commented Feb 17, 2017

Another thing we should do is minimize meta data. E.g. currently attachments and comments are encrypted in separate parts of the JSON. This allows the web server admin to see whether an attachment is done and how large it is.
Also settings like burn-after-reading are not needed to be known to the server and can be encrypted too.

As for comments we will probably not be able to change this, due to the nature of comments (anyone can add another one at any time and we cannot re-encrypt the whole paste in this case). But for everything else, we should add all to-be-encrypted data first in a plain-text JSON and then encrypt the whole JSON.
I think with the new Uploader module I created while refactoring the JS this should be quite easy to change.


Additionally all other data, which is needed on the server (expiration time e.g.) could currently be manipulated by the server. Already SCJL, however, has a mechanism which can prevent this, so we should make use of it.

So two things:

  • all to-be-encrypted data in JSON --> encrypt whole JSON
  • all not-secret data must be authenticated (so it cannot be modified)

@cryptomilk
Copy link
Collaborator

cryptomilk commented Feb 22, 2017

Paste Encryption

Just a suggestion

Data passed in

The following data is what we pass in

paste_password: UTF-8 string
paste_plaintext: UTF-8 text

Process data

Start with generating a one time password.

paste_otp = random(32) # 32 bytes
paste_passphrase = paste_otp

If paste_password is set:

paste_passphrase = paste_otp + bytes(paste_password)

Processing of the paste_plaintext:

If we want compression:

paste_datablob = zlib.compress(paste_data)
paste_compression = 'zlib'

else

paste_datablob = paste_plaintext
paste_compression = 'none'

Key generation for encryption (PBKDF2)

kdf_salt = random(16) - 16 bytes
kdf_iterations = 256000
kdf_keysize = 256 # bits of resulting kdf_key

kdf_key = PBKDF2HMAC(SHA256, kdf_keysize, kdf_salt, paste_passphrase)

Encryption

cipher_algo = "aes"
cipher_mode = "gcm"

cipher_iv = random(16) # 128 bit
cipher_associated_data = ""
cipher_tag_size = 16

cipher = Cipher(AES(kdf_key), GCM(iv, cipher_associated_data), paste_datablob)

cipher_text = cipher.text
cipher_tag = cipher.tag


json_data = {"v": 2,
             "kdf": "pbkdf2",
             "hash": "sha256",
             "salt": base64(kdf_salt),
             "iter": kdf_iterations,
             "ks": kdf_keysize,
             "cipher": cipher_algo,
             "mode": cipher_mode,
             "iv": base64(cipher_iv),
             "adata": base64(cipher_associated_data),
             "ct": base64(cipher_text),
             "ts": cipher_tag_size,
             "tag": base64(cipher_tag),
             "compression": paste_compression}

URL

url = "<paste_url>/?<paste_id>#<base64(paste_otp)>

@cryptomilk
Copy link
Collaborator

cryptomilk commented Feb 22, 2017

I think it should go into the wiki, so we can work together on this. I think information about the kdf and the hash used for the kdf should be passed around so it can be changed in future. I've added a compressed flag so you can turn compression of. Maybe we should send around information about compression.

However we should move as much information as possible into the cipher_text. So the cipher_text should probably be a json structure and hold the plaintext, information about the compression algorithm and if it is compressed.

plaintext = {"datablob": paste_datablob,
             "compression": paste_compression,
             "burn": 0}

@rugk
Copy link
Member Author

rugk commented Feb 22, 2017

Indeed I'd also argue for the cipher text to be a JSON. This is also easily parsable in JS.

@elrido
Copy link
Contributor

elrido commented Aug 5, 2018

FYI: Started work on this one in a branch. First I'll replace as many SJCL (and Base64.js, while I'm at it) calls as possible, to prove that at least the deciphering stays compatible with the existing paste formats (that's why I added unit tests for both the legacy and current paste format).

Then we can work on the new format, replacing the flawed deflate library in the process. At that point we will have to be able to write both the legacy and the new format and read all three of these formats. The legacy format will still require loading of Base64.js 1.7 and the rawdeflate/inflate libraries instead of pako, but ideally we would not need SJCL at all at that point (assuming that using webcrypt API we are able to produce identical cipher encoding as SJCL) and can instead rely on browser makers doing their job.

@cryptomilk
Copy link
Collaborator

Sounds great!

@elrido
Copy link
Contributor

elrido commented Aug 15, 2018

On starting to learn about the webcrypto's "subtle" API, I did find both the required PBKDF2 and AES-GCM interfaces. Already introduced the UTF-16/UTF-8 conversion (was done using Base64.js before). Most newer browsers have TextEncoder for that, but neither IE or Edge support this and it is relatively easy to do ourselves. Maybe will do a polyfill for the M$ browsers instead, but would make the code more complicated to maintain.

Currently hitting my head with the following: The unit tests, which I need to ensure the backwards compatibility with the old formats and for faster development cycles run on nodejs. Node has a crypto module (which interfaces with OpenSSL), but none of the Webcrypto interfaces. There is a nodejs module that provides a webcrypto API facade in front of the crypto module, but it still lacks PBKDF2 support. The crypto module seems to offer PBKDF2, so it may just need to get exposed in the facade. I may have to help that project get the PBKDF2 support first, so that I then can use that library to continue the development using node.

I did consider ignoring the node unit tests for now and instead develop purely in the browser, which is made easier thanks to the Janitor container. Problem is, I then have to build a lot of testing tools into privatebin.js, to be able to feed it the old paste formats or I need to generate old pastes on other servers and import them into the development instance. It would be much easier, if I could just keep the test code separate from the application and use node, so I may just have to bite the bullet and help building the necessary tool first.

@rugk
Copy link
Member Author

rugk commented Aug 15, 2018

Maybe will do a polyfill for the M$ browsers instead, but would make the code more complicated to maintain.

A polyfill would likely be better. Also better to use the real TextEncoder in newer JS versions. Actually there is already a polyfill for the whole API.

@elrido
Copy link
Contributor

elrido commented Sep 1, 2018

Another update: I found another nodejs module that implements all required webcrypto API calls, but needed a newer nodejs, fortunately I hadn't spent too much time on the other one and an upgrade to Ubuntu 18.04 on my development environment promised to solve the nodejs dependency. Unfortunately it turned out that the Debian/Ubuntu maintainers messed up and had compiled nodejs 8 against openssl 1.1, while modules assume that nodejs 8 implies openssl 1.0 and fail to compile against ubuntus Franken-node. I found a temporary workaround for that in the ticket and also switched to a newer node version in travis and our unit testing docker container (the latter resulted in a significant speedup: running mocha with the ubuntu node takes around 2 minutes, while the new container using alpines node takes 17 seconds, on the same machine).

Other issues were caused by the upgrade to gpg2 which broke my git commits as I sign them by default. Turns out you have to manually import the old gpg1 private keys.

After spending a couple of days on the actual API migration, I today finally managed to get it working, both for the old and current pastes. It ended up not being quite as complicated as expected, but I spent a lot of time figuring out the exact formats of all the parameters and how to parse them correctly and convert them to the ArrayBuffer syntax and back.

It is not yet finished: There are still some remaining SJCL calls in there that need to be removed. I also used the ugly async/await workaround to emulate synchronous behaviour for the existing functions. It has therefore be regarded as a proof of concept using the existing API.

@rugk, if you have time, how would you suggest we should change the logic to be able to use a callback or a promise with the cipher/decipher functions? This would allow to make proper use of the webcrypto APIs asynchronous nature.

@rugk
Copy link
Member Author

rugk commented Sep 1, 2018

Async and await are no workaround but perfectly fine features. AFAIK you only user them for the crypto API, so it's okay.
But without them you could probably(!) speed stuff up.

So as for your question: theoretically I would prefer promises, but look at browse support. Switching to them would be another round of refactoring the whole code, if we want to do it everywhere and get rid of most callbacks.

Also await has AFAIK quite less browser support in older browsers.

@cryptomilk
Copy link
Collaborator

Don't you still use sjcl? If you need to support it for backwards compatibility and a browser doesn't have the WebCrypto API you need, then just use sjcl?

@elrido
Copy link
Contributor

elrido commented Sep 2, 2018

No, as the current work proves, we can get rid of SJCL completly and still be able to decrypt even the old ZeroBin pastes (the ones generated using the incompatible Base64.js encoding).

Reg. async: Ok, so async/await and promises are out of the window then, due to poor browser support. So we would have to use traditional callbacks. @rugk, I see that we use these methods in five locations and four methods:

  1. Uploader.setData
  2. PasteDecrypter.decryptOrPromptPassword
  3. PasteDecrypter.decryptPaste
  4. PasteDecrypter.decryptComments

decryptOrPromptPassword already works asynchronous, in case the password is incorrect and gets asked via prompt, so it should not be too difficult to adapt it.

decryptPaste would need to be rewritten slightly, moving the bits after on success show paste into the callback.

In decryptComments we need to wait for two async calls to decrypt the comment and the nickname, but the logic could be inverted: Instead of passing the results into the DiscussionViewer.addComment call, we would instead run that in a callback once the two results are in (maybe two callbacks, one for the comment, then optionally a second callback for the nickname in a second async call). The tricky bit here is that we rely on the comments to be decrypted in the given sequence for display reasons. We may have to extend the DiscussionViewer with a method inserting an invisible placeholder to inject the final comment into, when ready.

What I am not sure how to handle is the Uploader.setData method. Any suggestions how the Uploader's interface would need to be changed to allow it to handle the async encryption? How to prevent the run if the setData calls aren't done yet?

@rugk
Copy link
Member Author

rugk commented Sep 2, 2018

Ok, so async/await and promises are out of the window then, due to poor browser support.

Well… I would love to use it. Actually the only problem is IE 11.
But IE 11 only supports the Crypto API itself poorly, e.g. lacking deriveKey completly.
For other operations, it returns a "KeyOperation" instead of a Promise, whatever that this…

Can't we just drop IE support? Or fallback to sjcl for it?

@rugk
Copy link
Member Author

rugk commented Sep 2, 2018

Looking at your implementation, you use 10 000 rounds of PBKDF2.-SHA256. This seems to be the minimum, see borgbackup/borg#77. Of course, as for the random key, I guess, it does not matter. But as for the password, we should pass that through a larger amount of key stretching rounds.

Also I do not feel fine with appending the password just to the key. I think, some more elaborate solution would be needed. (But this is just about decrypting sjcl-encrypted pastes, so as for that, we cannot change the format.)

Also this here, contradicts itself (likely just a wrong comment):

["encrypt"] // we may only use it for decryption

@rugk
Copy link
Member Author

rugk commented Sep 2, 2018

FYI here is Mozilla's implementation in Send, which we may take as some kind of reference.
E.g. what I see there, is that we may really need to add a nounce, to make sure the plaintext that is encrypted is never the same (or, at least, has never the same size).

@elrido
Copy link
Contributor

elrido commented Sep 2, 2018

Quick reminder to get back on track: At the moment the goal is to implement the current logic, so that we can replace SJCL entirely. Of course once we start implementing the new paste format, see above, we can massively increase the standards we apply.

Before we get there, though, I want to pin down the API, so that we can then focus on the implementation without having to touch the unit tests, so that we can rely on them bringing to light if we mess up the legacy paste support.

So, given that we never cared much about IE (since the days of ZeroBin there was a special warning, just for the IE users) and that Edge, the current MS browser stack supports both Webcrypto and promises, we should probably go with that.

You may have noticed that the travis builds since yesterday started failing frequently for the webcrypto branch. The reason was that the promises can take a while to get processed in the background, especially if you launch 100 of them simultaneously to test the crypto and they messed with the DOM states of subsequent tests that ran in parallel to them. I "fixed" this in the morning by severely reducing the amount of launched crypto tests and it is stable again, but it shows that the async/await pattern, while seemingly making your code more linear, simply hides the complexity of the promises behind them.

So yes, lets go with promises then.

Whats your suggestion on how the Uploader class should get extended/rewritten/refactored to support this? Could the Uploader.run() be triggered from the Promises then method? Should we show a status of "encrypting" or similar until it gets triggered and changes the status to "sending"?

@elrido
Copy link
Contributor

elrido commented Sep 2, 2018

Reg. nounce: We already use a random initialization vector (128 bits), so no two plaintexts should get the same cipher text anyway. Also there is a random salt (64 bits) used in the key derivation from the randomly generated key (256 bit).

Of course IV and PBKDF2 salt are known to anyone trying to brute force the key to an offline stored paste and increasing the iterations and raising the HMAC hash to SHA-512 will slow this down. -> New paste format - focus on adjusting the API first, please.

@rugk
Copy link
Member Author

rugk commented Sep 2, 2018

So yes, lets go with promises then.

No, I am sorry, but you do not seem to understand the async/await feature. So just let me explain two small things:

  • Yes, async/await is still asyncronous code(!), i.e. it behaves the same as Promises
  • Using async/await does exactly the same as Promises, it's just a different syntax(!), i.e. just how you write code is different. You can always rewrite code in both ways.
  • async/await does not replace Promises or so. You may always use them together…

As for the other stuff, I need to look into it. Now not, however…

As such, we of course go with async/await, it often makes using Promises easier. And if we drop IE, we can use it, so all is fine here.

As I said, this may just involve a whole round of refactoring…

Reg. nounce

Yep, a nounce has nothing to do with IV. I guess it is here mainly just to hide the size of the encrypted content.
But yes, certainly that's about the new paste format.

@rugk
Copy link
Member Author

rugk commented Oct 8, 2018

Okay, I've adjusted the Uploader object and from what I see, I guess it now encrypts correctly. However the result is a JSON obejct and when it is accessed later it is shown as [object Promise]. I guess, however, that is something in CryptTool.

@elrido
Copy link
Contributor

elrido commented Oct 20, 2018

Your implementation was a very helpful example and I managed to implement solutions for the other listed cases. There was a point where it "clicked" in my head and the promise concept started to make sense for separating the scope of the various involved components.

I can use the decryption promise and attach UI related actions to it and pass it on, to trigger further actions when multiple related promises get fulfilled in yet another layer. This is much cleaner then having to build callback interfaces into many methods and limits the scope that I have to consider in each of them.

Anyway, we now again have a working UI and no dependencies on SJCL at all, even for the current format. Next comes the interesting part of implementing the new format we discussed so long ago. Should document this in the wiki pages as I go, both for the old format (that we still need to read) and the new one (read & write).

@rugk
Copy link
Member Author

rugk commented Oct 20, 2018

This is much cleaner then having to build callback interfaces into many methods and limits the scope that I have to consider in each of them.

Exactly! It totally avoids this "callback hell".

@rugk
Copy link
Member Author

rugk commented Oct 20, 2018

Also this may speed up the whole encryption/decryption as we can start Promises one after another and let browsers potentially use multithreading or whatever they do, to encrypt/decrypt multiple stuff at the same time.
I think this may be used for the comments. (as they are encrypted separately).

@elrido
Copy link
Contributor

elrido commented Oct 21, 2018

I have reviewed the original proposal for the version 2 paste format and came up with a streamlined version, which I have put into the encryption format wiki page for review.

The key changes are:

  • make the encryption wrapper the top level object, instead of it being used for multiple encrypted parts - the encrypted text now contains a JSON containing all encrypted parts (paste, attachment, attachment file name, comment, nickname, etc.)
  • compression is now a flag, so far I document "none" and "zlib" (by which I mean a proper deflate implementation, not the Dan Kogai one) - it needs to be readable to the client, otherwise it doesn't know if it needs to decompress the decrypted data or not.
  • moved static meta data (that is generated by the client during paste/comment creation) to the adata section, so it is authenticated during encryption, but still readable by the server (documented why the server needs to read these)
  • dynamic meta data (generated by the server) remains in the non-authenticated meta section
  • introduced children as a property, in case we want to implement Versioned pastes #255
  • the key derivation avoids using base64, hex or sha256 functions on the key and password and just appends them, avoiding potential weaknesses introduced by this - we fully rely on the PBKDF2 algorithm instead
  • the key derivation iterations already got increased to 10000 in the webcrypto branch - this is backwards compatible with SJCL

Let me know what you think.

@rugk
Copy link
Member Author

rugk commented Oct 21, 2018

JSON containing all encrypted parts ([…] (comment[…])

What do you mean by comment? I guess these have to be encrypted separately, as these need to be added by different users/anybody than the paste creation.
Otherwise if this is encrypted at once then users would be able to also change the paste content when they add a comment. I doubt this is what we want. 😄

But generally yes, all paste content goes in a big JSON there. Also, I'd suggest to add a JSON entry nounce, which has the aim to hide the length of the message (in contrast to the salt, which is just used to hide prevent the same plaintext to be encrypted.
As for a generation we can just add some from 0 to 50 zeros or so. As [I've noticed such a use in the past]https://crypto.stackexchange.com/questions/31436/is-pkcs7-padding-generated-with-a-pseudo-random-number-generator-okay) you can find a discussion there. Be sure to read the comments too. So as we also use a block cipher it may not be needed, but it may be useful if we have many short messages, whose block length is always the same then.

compression

Yeah okay I see no big risk with that, but it should then at least be authenticated? I mean, it should be possible to put it in that part, should not it?
Generally in the wiki it is not really clear to me what part is now authenticated only and what is just plain text.
Maybe we can better illustrate it with some draw.io drawing? So you can see at one glance what is encrypted and what is not?

dynamic meta data (generated by the server) remains in the non-authenticated meta section

Can we then name this section something like "serverMetaData" or "unauthenticatedMeta" or so?

introduced children as a property, in case we want to implement #255

Hmm, don't think this is really needed. If we really implemented that in the future, we maybe rather need "parents" (as the parent doe snot know it gets a child when being encrypted) or we may internally implement it with a "hidden" comment or something like this.
I'd just say we do not need to think about this now. If we later implement it, we just bump up the paste version and then we can always change the JSON format.

key and password and just appends them

Yeah, generally good I think, but:

  • we do not need such a high number of PBKDF2 rounds for a randomly generated key. As such, maybe first use many rounds on the password (if used) to properly protect that and then just when combining key + paste key we use (just a "few" say 1000 e.g.) PBKDF2 rounds.
    This way we may use a higher number of rounds for the password while preventing too much slowing down as we do not need to pass the paste key so much through PBKDF2.
  • combing password and static key is a thing I have to read on. I am not sure just appending is good/secure. I've also heard it is often somehow XORed, but I don't know what's good here.

this is backwards compatible with SJCL

What do you mean by that? You can actually decrypt pastes where fewer rounds were used with much more rounds now? This sounds very strange to me…


BTW as this is a big change, why not name the next release "PrivateBin v2"?

@elrido
Copy link
Contributor

elrido commented Oct 21, 2018

  • comments: The encryption format described in that page is referenced in the API page. If you look at one of the calls, you can follow the links to our JSON-LD schema files, that are both an example of how the message is returned and a specification on how to parse them. Looking at the first example, you can see how the paste contains a comments property which in itself uses the encryption format described here, just with different contents (comment and nickname instead of paste and attachment).
  • nounce: This is a new feature request we hadn't discussed before. My understanding of AES type block ciphers was that they always return a multiple of the choosen blocksize (256 bit in our case) and will pad these if necessary. I am not a crypto expert, so if that is not the case then, yes, we would need to implement that ourselves. Most comments will be short, but we are not guaranteeing secure communication, just that a server admin can claim they didn't know the contents and aren't held responsible for them. We are leaking way too much meta data in our communication anyway for the message size to matter. If multiple people post messages of around the same size to the server, the content may or may not be identical.
  • compression: That statement is probably true of other properties like salt, iv, etc. AFAIK all of these are known before we encrypt the message and are neither changed by the encryption nor should they later be changed. It should make sense to authenticate all of them. We wouldn't have been able to do that with SJCL, but with webcrypto we just need to stick them in there.
  • dynamic meta: Since we are talking about only two properties, we could just move them to the top level instead.
  • children: The way the feature is described in Versioned pastes #255 the parent needs to know its children and link to them, not the other way around. So we need to link children in a paste and not parents. If you feel otherwise, please discuss this over in that ticket. The fact that I document this, doesn't imply we absolutely have to implement this feature now, can be a next release.
  • 10000 PBKDF2 rounds: I think this was the NIST recommendation that we were scolded for not using before in Using KDF for protecting against bruteforce attacks #350. From that ticket I had gotten the impression 10k is still rather conservative.
  • combining: Its what we do so far. My understanding was that this way, the password adds complexity to the fixed length. Before we had been doing weird things, like base64 encoding, using sha256 and hex and then combining them. You (rightfully) criticised this as doing our own crypto with unknown side effects. Hence we now just make it random and long and let PBKDF2 do the crypto on its own.
  • backwards compatible: We wouldn't have had to switch to webcrypto or needed to change the format to increase the iterations. SJCL just uses the value given with the message. Of course you can't change the value on existing pastes. Would require re-encryption, which is incompatible with our use case. But we have to expect to handle pastes that contain comments with multiple different encryption formats. And this we can.
  • version number: lets discuss that when we are there. So far we plan no big usability changes, so 2.0 seems a hard sell. It will work just like now, just on a different backend. 1.4 was planned to have big UI improvements.

@rugk
Copy link
Member Author

rugk commented Oct 21, 2018

comments: I take this as a "no, it's not contained in the paste".
nounce:

This is a new feature request we hadn't discussed before.

So what? Even if we have not, this whole issue is about discussing the new format. I do feel a big attacked by that sentence as if I were not allowed to discuss a new feature.
Apart from the fact that we indeed have discussed it before.

As said and as linked in the crypto.stackexchange.com it may not really be needed, but could be useful.

compression: Yeah, exactly. In any case, it won't hurt to do that. 😃

children: yeah, IMHO not add this now. As said, thanks to the versioning of the encryption format now we can easily add a new property (for that) later.

think this was the NIST recommendation

Yes, it was this (highlighting by me):

"Therefore, the iteration count SHOULD be as large as verification server performance will allow, typically at least 10,000 iterations."

So that's the absolute minimum. Actually we can aim for 1 second delay. As the delay however totally depends on the device, we might have to test mobile devices or so and check out what number we may use.
Discussion on this is also here

combining: As said, I'll look it up…

Would require re-encryption, which is incompatible with our use case.

Totally off-topic, but thinking about this, if we continue like this we would always have "legacy" pastes in PrivateBin. Even the old format should be quite okay (i.e. "secure"), but it may not be – especially not in the long term. As such, what we can do is detect the "legacy" format and show a suggestion/info message to the user to "press clone" and continue using the paste in a new format. But hmm… you loose the comments then, of course. 😢

version number: Yeah, okay. 😄

@elrido
Copy link
Contributor

elrido commented Oct 23, 2018

comments: The JSON-LD clearly documents that comments are an array in the paste? See below...

nounce: Sorry, that was not my intention. Of course we can discuss it, but please in a dedicated ticket assigned to the 1.3 milestone and then we can add it to the format in here. The comment you refer to is about the iv, which always was and needs to remain part of the format, as long as we use AES, which requires an initialization vector.

compression: I'll create a JSON-LD below incorporating this, maybe it is more clear what is my proposition.

children: Its just in there to document where it would go in case someone actually implement that ticket. Not my prio, but would clearly belong with the encrypted payload and not the clear text part.

iterations: Just suggest a number. Can still be changed at any point up to the release. Lets assume the 10000 is a placeholder until then.

combinations: Sorry, I thought that is obvious that we already do that and continue on doing. I have seen pastes with comments in three different old formats. Original paste in ccm 128 bit, some comments the same, later comments in ccm 256 bit, latest in gcm 256 bit. All in one page and all could be decrypted properly. It's like with some of my old word documents or bitmaps from 1992 that I still want to read occasionally. Sure, at some point conversion may be an option, but in our model that would require overwriting the message, so the clone would be an alternative for that?

Anyhow, here is the updated suggestion, this time expressed as JSON-LD in the hopes to make the intention clearer. This is the response a client would get upon requesting a paste:

{
	"@context": {
		"so": "https://schema.org/",
		"pb": "https://privatebin.org/",
		"status": {"@id": "so:Integer"},
		"id": {"@id": "so:name"},
		"deletetoken": {"@id": "so:Text"},
		"url": {
			"@type": "@id",
			"@id": "so:url"
		},
		"v": {
			"@id": "so:Integer",
			"@value": 2
		},
		"ct": {
			"@type": "so:Text",
			"@id": "pb:CipherText"
		},
		"adata": {
			"@id": "pb:AuthenticatedData",
			"@container": "@list",
			"@value": [
				{
					"@type": "so:Text",
					"@id": "pb:InitializationVector"
				},
				{
					"@type": "so:Integer",
					"@id": "pb:Iterations"
				},
				{
					"@type": "so:Integer",
					"@id": "pb:KeySize"
				},
				{
					"@type": "so:Integer",
					"@id": "pb:TagSize"
				},
				{
					"@type": "so:Text",
					"@id": "pb:Mode",
					"@value": "gcm"
				},
				{
					"@type": "so:Text",
					"@id": "pb:Algorithm",
					"@value": "aes"
				},
				{
					"@type": "so:Text",
					"@id": "pb:Compression",
					"@value": "zlib"
				},
				{
					"@type": "so:Text",
					"@id": "pb:Salt"
				}
			]
		},
		"meta": {
			"@id": "?jsonld=pastemeta"
		},
		"comments": {
			"@id": "?jsonld=comment",
			"@container": "@list"
		},
		"comment_count": {"@id": "so:Integer"},
		"comment_offset": {"@id": "so:Integer"}
	}
}

Note:

  1. I am introducing a new vocabulary for cryptographic primitives, since I have not found any for use in RDF or JSON-LD.
  2. I am deliberately using a list for the adata section, because arrays in JSON are ordered, while objects aren't. For decryption this part would need to be turned into a JSON string like "[x,y,z]" so that it can be fed as a buffer into the webcrypto. Using an object the order wouldn't be guaranteed by the stringifier across browser engines and programming languages. Also, an array is more compact (but less human readable).

pastemeta would be reduced to:

{
	"@context": {
		"so": "https://schema.org/",
		"postdate": {"@id": "so:Integer"},
		"remaining_time": {"@id": "so:Integer"}
	}
}

The comments would be changed similarly: a single ct section, static settings into the adata and minimal meta section.

To avoid any confusion: The above is a schema describing the message structure, not how the message actually looks like. The properties starting with "@" are descriptors of the syntax, "so" and "pb" contain vocabularies for the syntax.

@rugk
Copy link
Member Author

rugk commented Oct 24, 2018

Ah, okay, thanks. IMHO, that LGTM.

What just came into my mind was to duplicate some of the non-authenticated data in the adata part, so it can be authenticated after decryption. This could prevent some awkward attacks like version downgrade (if the new hypothetical decryption v3 could be downgraded to v2) or the server serving a different paste for an old paste ID (include paste ID in duplicated format).

@elrido
Copy link
Contributor

elrido commented Dec 17, 2018

Took forever and a several attempts to get the JSON-LD done. Have to say that JSON-LD as DSL is helpful as it forces you to think about how you will use the described data structure and so I discovered some flaws in the above proposal before I even attempted to implement it in JS or PHP.

Anyhow, here are the files describing the format incl. key differences to the above:

types.jsonld - introduced to document PB specific data types used in the pastes and comments, including crypto primitives. This is partially there as documentation for humans (defaults in '@value', min & max or options in '@enum'), but we can and should also use it for feature detection. Examples:

  1. Formatter should list actually supported formats (admins remove some formats in the configuration)
  2. Expire should list configured expiration options (configurable option)
    The JSON strings contained in the Base64 encoded cipher text are documented as PasteCipherMessage and CommentCipherMessage. I decided to unify paste and attachment in one message, so looking at the stored data there isn't any indication if a paste contains an attachment (except for the size) or a comment has a nickname or not (can be deduced by the icon property).

paste.jsonld & pastemeta.jsonld - mostly as described in the proposal. Things in 'adata' are created by the client and can't be changed on the server (or the decryption fails, which I have to test in browsers), while the meta contains server side information. Using the adata has the downside that we have to return these properties to the client in the data (some of it is irrelevant for displaying the paste), but I think this slight overhead is acceptable for the gain.

comment.jsonld & commentmeta.jsonld - similar to paste in structure. Note that the vizhash is now the more generic 'icon' property, but still part of the server side generated meta section. When creating a comment, setting that property should be used to indicate to the server that it needs to generate one, since the nickname, which we used to detect this so far, is now "hidden" in the cipher text.

Other changes: I tried to make the formatting more consistent for readability and removed the incorrect use of '@id'.

I think next I'll work on the JS implementation, then the PHP side (support validation of new format, changes of DB tables, if necessary).

@rugk
Copy link
Member Author

rugk commented Dec 17, 2018

while the meta contains server side information. Using the adata has the downside that we have to return these properties to the client in the data (some of it is irrelevant for displaying the paste), but I think this slight overhead is acceptable for the gain.

(1)
I am also not worried about the overhead, but just to get my head around it: So what is irrelevant for the client, exactly? This one? But that's not authenticated, is it?
AFAIK all other stuff (except of the "Expire" possibly) is required for the server?

But taking the "Expire" I wonder whether it is useful to put it into the adata part anyway? I mean, the server can likely use it, but cannot verify the authenticity anyway, as it does not have the key?
So maybe we can at least double-check it on the client afterwards, i.e. if "CURRENT_TIME" > "Expire" discard the paste and do not display it? (when the server fails due to a wrong time zone or so that may be useful, or just as a fallback – and to actually use the authenticated data…)
That said, if it works, we can (and possible rather should) of course leave it in adata… if it does not hurt, it's all right.

(2)
BTW seeing this is the "creation time" also actually sent to the client? If so, that would be a new privacy concern/"metadata leak" we would have to be aware of, as obviously AFAIK now the client does not know when the paste was created…
(also applies to the expire time, as well… – so is it bad if the client knows this?)

(3)
These two sections are completely the same.
Thinking ahead of implementing them, cannot you somehow "merge" them in the spec, i.e. as an extra JSON object "encryptionProperties" or so? Then you can have one JS function that takes all these as an input for both and returns the JS object…? Just as an idea… it would likely also make the spec more readable.
Can also be just be a "base" that is extended by the properties that you add for paste/comment? (Or does JSON-LD even support "inheritage" somehow, so you can have a common "base" "class"?)


BTW, generally said, these JSON-LD's lock great and seem to be a great tool for this "spec work". 😄

@elrido
Copy link
Contributor

elrido commented Dec 17, 2018

  1. The whole AuthenticatedData type would be the contents of the adata. The two unneccessary properties (for the client) are Expire and BurnAfterReading. I wouldn't mind moving them to the meta section, but then again it is nice to know that an admin can't just change the expiration retroactively or that a paste isn't actually a BurnAfterReading one that the server doesn't delete after all. The authenticity can't be validated by the server (it requires a key) but is by the client. Decryption (should) fails if something gets changed in there. Ideally the web crypto API even throws a separate exception in this case, so we get to display a meaningful message.
  2. Actually the creation time is only used by the server. It is used, together with the expiration setting, to calculate the expiration timestamp and then decide if the paste needs to be deleted. We could keep it internal in the format and not actually expose it in the API.
  3. Yes, we could. The only restriction is that due to JSON parsers not being required to preserve the order of properties in objects, we have to use lists/arrays with fixed offsets for the adata. But we could use nested arrays: [[crypto,params],other,params] for paste and just [crypto,params] for the comments.

@elrido
Copy link
Contributor

elrido commented Dec 17, 2018

I have implemented the changes for point 3 in 1de57c8.

Correction on 2., though: That time stamp is actually used in the comments for displaying the comments date (in localized formatting). We could still keep the pastes creation date internal (it currently gets exposed).

@rugk
Copy link
Member Author

rugk commented Dec 17, 2018

  1. Yeah… okay, so that's why I thought if we elaborate on that, the client may actually also verify the values it get's "semantically", i.e. as for the "expire" time check if the server should have been deleting the paste… I.e. show an error (that could point out a server setup issue or app bug) or just refuse to display a paste if "expire >> now"…
    "BurnAfterReading" is hard to check, I guess. (I mean possibly not, the client could do another request and check if that fails, but this may be a little too much checking and may result in other issues – see also: leaks metadata)
    I mean at that stage it is hardly detecting any malicious server that "could mangle" with the expire time or so, as – if you want to do that – you've got easier ways to do that and we actually always assume a "trusted server", but yeah… it's not bad.

We could still keep the pastes creation date internal (it currently gets exposed).

Okay, that's a "metadata leak" that currently is there. So IMHO, we should keep it secret. It has no value on the client and for a paste creator it is not obvious it is exposed.
The other option would be: Display it somewhere in the UI. Then, it's obvious for a paste creator that it is shown…
But hey, this is a new issue, so here you go #390.

@rugk
Copy link
Member Author

rugk commented Dec 22, 2018

BTW the MDN explanations of the WebCrypto API have been mostly rewritten and they already look quite good, IMHO.

@rugk
Copy link
Member Author

rugk commented Dec 25, 2018

BTW another resource that explains how we should do it very well, e.g.: https://timtaubert.de/talks/keeping-secrets-with-javascript/

@elrido
Copy link
Contributor

elrido commented Dec 25, 2018

Thanks for sharing. I implemented that bit a few months back. Since you familiarized yourself with the topic, it would be helpful if you could compare those recommendations with the current state of implementation in the webcrypto branch and let me know if anything sticks out to you.

I'm a bit in the middle of things: The cipher was switched to the new format, but currently doesn't compress. decipher now supports both the legacy and the current state of the new format as provided by cipher. The server side is still untouched, so it currently doesn't work at all as a webservice and I have to rely on the JS unit tests to tell me if I broke the compatibility or not. Next I want to switch out the compression for newly created pastes and then start to look at what needs to be changed in the PHP side, incl. database table changes.

@mqus
Copy link

mqus commented Dec 14, 2020

I know, this is done for two years now, but I am building a very simple client right now and parsing the heterogenely typed list for adata is a real PITA, even more if is nested.

Most programming languages help with setting up json <-> data type deserialization but those lists really make simple parsing difficult, especially if said languages have static typing. It would have been nicer (for other languages) to just explicitly order the parts for the authentication and use normal objects otherwise.

Please consider changing that for a possible v3 API :)

@rugk
Copy link
Member Author

rugk commented Dec 15, 2020

Hi @mqus and great to hear you are building a PrivateBin client. Don't forget to add it to https://github.com/PrivateBin/PrivateBin/wiki/Third-party-clients. 😉 😃

As for your feedback I'm afraid I don't exactly get what you suggest. Yes we use JSON and store our adata there.
In any case, there is no need to discuss this here.
If you have a suggestion, please create a new issue and explain (possibly with examples) on how the API may be improved. That does not mean it will happen, but at least we will get an idea and can consider it. 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants