I don't understand the need for this level of engineering. It appears we are going for an opaque bearer token here. The checksum is pointless because an entire 512 bit token still fits in an x86 cache line. Comparing the whole sequence won't show up in any profiler session you will ever care about.
If you want aspects of the token to be inspectable by intermediaries, then you want json web tokens or a similar technology. You do not want to conflate these ideas. JWTs would solve the stated database concern. All you need to store in a JWT scheme are the private/public keys. Explicit tracking of the session is not required.
> The checksum is pointless because an entire 512 bit token still fits in an x86 cache line
I suppose it’s there to avoid round-trip to the DB. Most of us just need to host the DB on the same machine instead, but given sharding is involved, I assume the product is big enough this is undesirable.
The point of the checksum is to just drop obviously wrong keys. No need to handle revocation or do any DB access if checksum is incorrect, the key can just be rejected.
Hey OP, sorry for the negativity, I think most of these commenters right now are pretty off-base. My company is building a lot of API infrastructure and I thought this was a great write up!
While it's true that API keys are basically prefix + base32Encode(ID + secret), you will want a few more things to make secure API keys: at least versioning and hashing metadata to avoid confused deputy attacks.
GitHub introduced checksums to their tokens to aid offline secret scanning. AFAIK it’s mostly an optimization for that use case. But the checksums also mean you can reveal a token’s prefix and suffix to show a partially redacted token, which has its benefits.
Side note: the slug prefix is not primarily intended for the end-user / developer to figure out which kind of key it is, but for security scanners to detect when they are committed to code / leaked and invalidate them.
Reading “hex” pointing to a clearly base62-ish string was a bit interesting :-)
Also, could we shard based on a short hash of account_id, and store the same hash in the token? This way we can lose the whole api_key → account_id lookup table in the metashard altogether.
I know sometimes people just like to try things out, but for the love of god do not implement encryption related functionality yourself. Use JWT tokens and OpenSSL or another established library to sign them. This problem is solved. Not essentially solved, solved.
Creating your own API key system has a high likelihood of fucking things up for good!
You don't need any encryption or signing for API keys. Using JWTs is probably more dangerous here, and more annoying for people using the API since you now have to handle refreshing tokens.
Plain old API keys are straightforward to implement. Create a long random string and save it in the DB. When someone connects to the API, check if the API key is in your DB and use that to authenticate them. That's it.
I don't understand the need for this level of engineering. It appears we are going for an opaque bearer token here. The checksum is pointless because an entire 512 bit token still fits in an x86 cache line. Comparing the whole sequence won't show up in any profiler session you will ever care about.
If you want aspects of the token to be inspectable by intermediaries, then you want json web tokens or a similar technology. You do not want to conflate these ideas. JWTs would solve the stated database concern. All you need to store in a JWT scheme are the private/public keys. Explicit tracking of the session is not required.
> The checksum is pointless because an entire 512 bit token still fits in an x86 cache line
I suppose it’s there to avoid round-trip to the DB. Most of us just need to host the DB on the same machine instead, but given sharding is involved, I assume the product is big enough this is undesirable.
You need to support revocation, so I'm not sure it's ever possible to avoid the need for a round trip to verify the token.
The point of the checksum is to just drop obviously wrong keys. No need to handle revocation or do any DB access if checksum is incorrect, the key can just be rejected.
> I assume the product is big enough
Experience tells otherwise
Hey OP, sorry for the negativity, I think most of these commenters right now are pretty off-base. My company is building a lot of API infrastructure and I thought this was a great write up!
While it's true that API keys are basically prefix + base32Encode(ID + secret), you will want a few more things to make secure API keys: at least versioning and hashing metadata to avoid confused deputy attacks.
Here is a detailed write-up on how to implement production API keys: https://kerkour.com/api-keys
I don't like giving away any information what-so-ever in an API key, and would lean towards a UUIDv7 string, just trying to avoid collisions.
Even the random hex with checksum component seems overkill to me, either the API key is correct or it isn't.
GitHub introduced checksums to their tokens to aid offline secret scanning. AFAIK it’s mostly an optimization for that use case. But the checksums also mean you can reveal a token’s prefix and suffix to show a partially redacted token, which has its benefits.
Side note: the slug prefix is not primarily intended for the end-user / developer to figure out which kind of key it is, but for security scanners to detect when they are committed to code / leaked and invalidate them.
Hello everyone this is my third blog, I am still a junior learning stuff ^_^
Hey, welcome to HN!
Reading “hex” pointing to a clearly base62-ish string was a bit interesting :-)
Also, could we shard based on a short hash of account_id, and store the same hash in the token? This way we can lose the whole api_key → account_id lookup table in the metashard altogether.
The purpose of the checksum is to help secret scanners avoid false positives, not to optimize the (extremely rare) case where an API key has a typo
Hey - this was a great blog ! I liked how you used the birthday paradox here.
PS : I too am working on a APIs.Take a look here : https://voiden.md/
I know sometimes people just like to try things out, but for the love of god do not implement encryption related functionality yourself. Use JWT tokens and OpenSSL or another established library to sign them. This problem is solved. Not essentially solved, solved. Creating your own API key system has a high likelihood of fucking things up for good!
You don't need any encryption or signing for API keys. Using JWTs is probably more dangerous here, and more annoying for people using the API since you now have to handle refreshing tokens.
Plain old API keys are straightforward to implement. Create a long random string and save it in the DB. When someone connects to the API, check if the API key is in your DB and use that to authenticate them. That's it.
I would add the capability to be able to seamlessly rotate keys.
But otherwise, yes, for love of everything holy - keep it simple.
The securify here comes from looking the key up in the DB, not from any crypto shenanigans.