Geoffroy Couprie, Author at Clever Cloud

Biscuit tutorial

Geoffroy Couprie — Thu, 15 Apr 2021 11:25:00 +0000

In the previous article, I introduced Biscuit, our authentication and authorization token, and mentioned its Datalog based language for authorization policies. Let's see how it works!

From a personal blog to an entire newspaper

As an example, we will build up authorization policies, going from a small, personal blog, to a professional journal with multiple teams, editors, etc.

Since those policies will be written in Datalog, let's take a short look at that language first.

Side note: introduction to Datalog

Datalog is a declarative logic language that is a subset of Prolog. A Datalog program contains "facts", which represent data, and "rules", which can generate new facts from existing ones.

As an example, we could define the following facts, describing some relationships:

parent("Alice", "Bob");
parent("Bob", "Charles");
parent("Charles", "Denise");

This means that Alice is Bob's parent, and so on.

This could be seen as a table in a relational database:

parent
	Alice	Bob
	Bob	Charles
	Charles	Denise

We can then define rules to query our data:

parent_of_charles($name) <-
  parent($name, "Charles");

This could be written in SQL as:

SELECT DISTINCT name from parent where child = "Charles";

(we use DISTINCT because Datalog will always remove redundant results)

We can also use rules to create new facts, like this one: (variables are introduced with the $ sign)

grandparent($grandparent, $child) <-
  parent($grandparent, $parent),
  parent($parent, $child);

You can read it as follows:

create the fact grandparent($grandparent, $child)
  IF
    there is a fact parent($grandparent, $parent)
    AND there is a fact parent($parent, $child)
    with matching $parent variable

or in SQL:

INSERT INTO grandparent( name, grandchild )
  SELECT A.name as name, B.child as grandchild
  FROM parent A, parent B
  WHERE A.child = B.name;

Applying this rule will look at combinations of the parent facts as defined on the right side of the arrow (the "body" of the rule), and try to match them to the variables ($grandparent, $parent, $child):

parent("Alice", "Bob"), parent("Bob", "Charles") matches because we can replace $grandparent with "Alice", $parent with "Bob", $child with "Charles"
parent("Alice", "Bob"), parent("Charles", "Denise") does not match because we would get different values for the $parent variable

For each matching combination of facts in the body, we will then generate a fact, as defined on the left side of the arrow, the head of the rule. For parent("Alice", "Bob"), parent("Bob", "Charles"), we would generate grandparent("Alice", "Charles"). A fact can be generated from multiple rules, but we will get only one instance of it.

Going through all the combinations, we will generate:

grandparent("Alice", "Charles");
grandparent("Bob", "Denise");

which can be seen as:

grandparent
	Alice	Charles
	Bob	Denise

Interactions with a Datalog program are done through queries: a query contains a rule that we apply over the system, and it returns the generated facts.

First steps: personal blog

*note: you can follow along the various steps of this tutorial in the online playground.

When we are the only user of that blog, we do not need much (honestly we could get away with just a random string in a cookie, but bear with me). We only need a way to identify ourselves to the blog engine's admin panel. So we could just consider the Biscuit token as a fancy JWT, that will only contain data (so, in Datalog, facts).

Our token will contain this fact: user(#authority, "user_1234").

Here, "user_1234" is our user id, and #authority is a special symbol that can only be added to facts in the first block of a token (or added by the verifier). A block contains facts (data), rules (to generate facts) and checks (queries used to validate the facts). Attenuation is done by adding more blocks. Since #authority facts are about the basic rights of a token, adding #authority facts would increase the number of rights. So we forbid adding #authority facts in additional blocks. Symbols, as indicated by the # prefix, are special strings that are internally replaced with integers, to compress tokens and accelerate evaluation.

The token can be serialized to a byte array (encoded with Protobuf) and then to base64 if we want to carry it in a cookie.

On the blog engine's side, we will only have this single line:

allow if user(#authority, "user_1234");

Biscuit can enforce authorization in 2 ways:

checks, starting with check if
allow/deny policies, starting with allow if or deny if

They work a bit like rules: if there's at least one combination of fact in the body (after the if) that fits, then it matches. They will not produce any fact.

To validate a token:

all of the checks must match. If one does not, fail
allow/deny policies are tried in order until one matches
- if allow matches, succeed
- if deny matches, fail
if none match, fail

Here the allow test will succeed if the token contains the fact user(#authority, "user_1234")

It is not very useful yet, but maybe we can add more features?

Next: multi-blog platform

After a few friends have seen your marvelous website, they ask if you could host their blogs on the same platform. So now you need more flexible authorization rules. We could keep the small tokens with the user id, but add more intelligence on the server's side.

First we need to indicate who owns which blog, with the format owner(#authority, $user_id, $blog_id). You can load this data when creating the verifier, from your database, from static files, etc.

owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");

Here we own "blog1" and "blog3", and "user_5678" owns "blog2".

Now we need to actually validate the request, to see who has access to what. The request is represented through the #ambient facts, added to the verifier: you indicate to the verifier facts representing the current request like which resource is accessed, which operation (read, write, etc), the current time, the source IP address, etc. As an example, a PUT /blog1/article1 to modify an article could be translated as:

blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);

In the verifier, we add a rule to indicate that the owner of a blog has full rights on it:

right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

If this rules finds a matching set of facts, it will produce a right(...) fact.

The verifier will also use an allow policy for the presence of that right (you will see why we separate them in the next section):

allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);

// unauthenticated users have read access
allow if
  operation(#ambient, #read);

// catch all rule in case the allow did not match
deny if true;

So if we tried to do a PUT /blog1/article1 with the token containing user(#authority, "user_1234"), we would end up with the following facts:

user(#authority, "user_1234");
blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #update);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");

If we applied the verifier's rule, we would end up with:

right(#authority, "blog1", "article1", #update) <-
    owner(#authority, "user_1234", "blog1"),
    article(#ambient, "blog1", "article1"),
    user(#authority, "user_1234"),
    operation(#ambient, #update);

So we end up with the new fact right(#authority, "blog1", "article1", #update).

Now the verifier applies the check:

allow if
  blog(#ambient, "blog1"),
  article(#ambient, "blog1", "article1"),
  operation(#ambient, #update),
  right(#authority, "blog1", "article1", #update);

And the test succeeds! If we had tried the request with a token containing user(#authority, "user_5678"), the rule would not have produced the right() fact, and it would have failed.

Now if we did a GET /blog1/article1 request, without being the owner of the blog, we would have matched allow if operation(#ambient, #read).

But maybe we don't want to have all articles available by default, maybe some of them are still in writing, so let's remove that allow policy. We want to mark an article as publicly readable by creating the fact readable(#authority, $blog_id, $article_id). We can do that with this test:

allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

So if we did a GET /blog1/article1 request with that article marked as readable, we would get the facts:

blog(#ambient, "blog1");
article(#ambient, "blog1", "article1");
operation(#ambient, #read);
owner(#authority, "user_1234", "blog1");
owner(#authority, "user_5678", "blog2");
owner(#authority, "user_1234", "blog3");
readable(#authority, "blog1", "article1");

The test would apply as follows:

allow if
  operation(#ambient, #read),
  article(#ambient, "blog1", "article1"),
  readable(#authority, "blog1", "article1");

And we got access. In a few lines, we created basic rules to protect our blog platform. But users need more features!

add reviewers

Often, we'd like to ask friends and colleagues to review articles before they are published. In our system, it could be done in two ways:

mint a token containing only right(#authority, "blog1", "article1", #read)
derive the user's token, adding a check restricting to the article

In the second case, the token would look like this:

Block 0 (authority):
  facts: [ user(#authority, "user_1234") ]
  rules: []
  checks: []

Block 1:
  facts: []
  rules: []
  check: [
    check if article(#ambient, "blog1", "article1"), operation(#ambient, #read)
  ]

if we tried to do a PUT /blog1/article1, the verifier's checks would succeed, but the token's check would fail, because it does not find the operation(#ambient, #read) fact. But for a GET /blog1/article1, all checks would succeed. The reviewer will not be able to remove the block while keeping a valid signature, so any alteration will result in a failed request.

premium accounts

Now some of the blog authors want to make living out of it (come on, it's 2021, do a newsletter instead) and mark some articles as "premium", so that only some users can access them.

We can do that by having premium_user(#authority, $user_id, $blog_id) facts and adding a rule on the verifier's side:

right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

We could even add a feature like LWN.net where a paying user can share a premium article, by deriving their tokens to only accept that article.

We're a big newspaper now, we want roles and teams

Againt all odds, our blog platform is a smashing success. We need to recruit journalists, editors, copywriters... So now we might need more flexible rights management, maybe some teams and roles?

Let's define more facts and rules to encode that. As an example, let's define a "contributor" role that can only read or write articles, while owners are the only ones who can create or delete.

right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  contributor(#authority, $user_id, $blog_id),
  [#read, #update].contains($operation);

What you can see on the last line is an expression: Biscuit's Datalog implementation can require additional conditions on some values, like a string matching a regular expression, or a date being lower than an expiration date, or here, presence in a set. This rule will only produce if the operation is #read or #update.

Now, we want to define contributor teams to manage them more easily. So we will introduce the team(#authority, $team_id), member(#authority, $user_id, $team_id) and team_role(#authority, $team_id, $blog_id, #contributor) facts.

Additionally, we insert this rule in the verifier:

contributor(#authority, $user_id, $blog_id) <-
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor);

This rule will generate the contributor fact for a blog if we are member of a team that has the "contributor" team role.

We could also fold the two precedent rules in one:

right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

And that's it! With a few rules, we can model more and more complex authorization patterns, some of them relying on user provided policies, without compromising the previous features. Rules are additive, so there's no need for a long chain of if/else and special cases hardcoded in some endpoints. Everything can be managed in one place.

To sum up the rules of our system:

// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $user_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

// unauthenticated users have read access on published articles
allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

// authorize if got the rights on this blog and article
allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);


// catch all rule in case the allow did not match
deny if true;

And here is an example Rust program reproducing this authorization system:

use biscuit::{crypto::KeyPair, error, token::Biscuit, parser::parse_source};
use biscuit_auth as biscuit;

fn main() -> Result<(), error::Token> {
    let start = std::time::Instant::now();

    // First, let's create the root key for the system
    // its public part will be used to verify the token
    let mut rng = rand::thread_rng();
    let root = KeyPair::new();

    // Token creation
    // we will add a single fact indicating identity
    let mut builder = Biscuit::builder(&root);
    builder.add_authority_fact("user(#authority, \"user_1234\")")?;

    let token = builder.build()?;
    println!("{}", token.print());
    let token_bytes = token.to_vec()?;
    let serialized = base64::encode_config(&token_bytes, base64::URL_SAFE);
    println!("serialized ({} bytes): {}", token_bytes.len(), serialized);

    let deserialized_token = Biscuit::from(&token_bytes)?;
    // Token verification
    // first, we validate the signature with the root public key
    let mut verifier = deserialized_token.verify(root.public())?;

    // simulate verification for PUT /blog1/article1
    verifier.add_fact("blog(#ambient, \"blog1\")")?;
    verifier.add_fact("article(#ambient, \"blog1\", \"article1\")")?;
    verifier.add_fact("operation(#ambient, #update)")?;

    // add ownership information
    // we only need to load facts related to the blog and article we're accessing
    verifier.add_fact("owner(#authority, \"user_1234\", \"blog1\")")?;
    //verifier.add_fact("owner(#authority, \"user_5678\", \"blog2\")")?;
    //verifier.add_fact("owner(#authority, \"user_1234\", \"blog3\")")?;

    let (_remaining_input, mut policies) = parse_source("
// the owner has all rights
right(#authority, $blog_id, $article_id, $operation) <-
    article(#ambient, $blog_id, $article_id),
    operation(#ambient, $operation),
    user(#authority, $user_id),
    owner(#authority, $user_id, $blog_id);

// premium users can access some restricted articles
right(#authority, $blog_id, $article_id, #read) <-
  article(#ambient, $blog_id, $article_id),
  premium_readable(#authority, $blog_id, $article_id),
  user(#authority, $user_id),
  premium_user(#authority, $user_id, $blog_id);

// define teams and roles
right(#authority, $blog_id, $article_id, $operation) <-
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  user(#authority, $user_id),
  member(#authority, $usr_id, $team_id),
  team_role(#authority, $team_id, $blog_id, #contributor),
  [#read, #write].contains($operation);

// unauthenticated users have read access on published articles
allow if
  operation(#ambient, #read),
  article(#ambient, $blog_id, $article_id),
  readable(#authority, $blog_id, $article_id);

// authorize if got the rights on this blog and article
allow if
  blog(#ambient, $blog_id),
  article(#ambient, $blog_id, $article_id),
  operation(#ambient, $operation),
  right(#authority, $blog_id, $article_id, $operation);


// catch all rule in case the allow did not match
deny if true;
    ").unwrap();

    for (_span, fact) in policies.facts.drain(..) {
        verifier.add_fact(fact)?;
    }

    for (_span, rule) in policies.rules.drain(..) {
        verifier.add_rule(rule)?;
    }

    for (_span, check) in policies.checks.drain(..) {
        verifier.add_check(check)?;
    }

    for (_span, policy) in policies.policies.drain(..) {
        verifier.add_policy(policy)?;
    }

    let res = verifier.verify()?;
    let dur = std::time::Instant::now() - start;
    //println!("res: {:?}", res);
    println!("{}", verifier.print_world());

    println!("ran in {:?}", dur);
    Ok(())
}

The entire program (key generation, token creation, serialization, deserialization, signature validation and facts verification) runs in 0.5 ms. So even with all of these features, Biscuit is fast enough to get out of your way.

What's next

You can already start using Biscuit in Rust, Java and Go.

The Rust version can also generate C bindings, currently used to develop a Haskell version, and there is a WebAssembly wrapper.

As an example integration, you can check out a Biscuit based authorization plugin for Apache Pulsar.

The specification is developed in the open, you can contribute.

Biscuit, the foundation for your authorization systems

Geoffroy Couprie — Mon, 12 Apr 2021 09:45:00 +0000

After 2 years of development, I am proud to share with you the official release of Biscuit, the authentication and authorization token we develop to manage access to our systems.

Where does it fit in the current authentication projects landscape (and why all of those cake themed names)?

Cookies are a storage area in browsers, which can contain a session identifier (the session data is then in a database, indexed by those identifiers), or authentication tokens. They're good with lots of chocolate chips.
JSON Web Tokens or JWT (pronounced "jot") contain cryptographically signed data. Since the signature guarantees it has not been modified, a web application could store session data in a JWT and send it in a cookie, and read it from HTTP requests. The signature can be done with secret key cryptography (HMAC algorithm), or public key cryptography (RSA, ECDSA). They can even be encrypted, and stored in a cookie, but they cannot be eaten.
Macaroons are cryptographically signed (HMAC) tokens focused on authorization. They embed caveats, conditions that the request must fit. They support attenuation: the holder of a token can create a new valid token by adding a caveat, further restricting the token. A macaroon can be stored in a cookie. It is also an Italian almond or coconut-based cake (do not confuse it with the French macaron which is also an almond based cake)
Open Policy Agent is a server-side logic language used to encode authorization policies

Biscuit unifies these various approaches:

it can be signed with public key or secret key cryptography like JWT
it can be attenuated like Macaroons
it comes with a powerful logic language to write authorization policies, like OPA, but those policies can also be carried by the token

By assembling those techniques, it opens up an array of authorization patterns that were not possible before.

When we started working on Biscuit, we were battling common issues in modern web applications:

In a microservices system, how do you handle authorization from an initial request, as it goes from service to service?
How do you reconcile an application's authorization policies (often some basic roles and groups) with a client's organization chart?

The microservices case is tricky: the initial request may come from a user for which we can look up a list of rights, but some services in the request tree may not even have a concept of user: at Clever Cloud, the service that launches virtual machines never hears about who requested a new deployment. With JWT, you could generate a temporary token in the user-facing API, and carry that from service to service. But then, any service holding that token has the entire set of rights for that request. Also, we need to make sure the authorization policies are evaluated in the same way in all services. With Macaroons, a service can attenuate the token before sending it to the next service, by adding a caveat, a condition over the current request (expiration date, limiting to read operations, restricting file paths to a prefix...). Unfortunately, Macaroon validation requires knowing the secret key used to generate the initial token.

Macaroons use a design based on chaining HMAC calculations: start from the initial secret, sign the first caveat, then for each new caveat, sign it using the previous signature as key. If you know the initial secret key, you can reconstruct the entire chain and verify that you obtain the same initial signature. But distributing that key in every service is a security risk: if someone gets access to this key, they can create a token with any authorization level they want. On the other side, JWT only requires verifiers to know a public key, and the private key can be kept in the service creating the token.

That was one of the motivating goals for Biscuit: what if we could attenuate the token, but still be able to verify it with public key cryptography?

As it turns out, a cryptographic concept called aggregated signatures can help us: we take multiple messages, each individually signed with a different public key, and we aggregate all of those signatures into one main signature. From that aggregated signature, it is impossible to remove one of the messages and keep a valid signature, but we can always add more signed messages. We can verify the aggregated signatures if we know the public keys for each message. From this, we reproduce the Macaroon design, with public key cryptography.

To provide attenuation, we could have reused the Macaroons approach with caveats, but its user experience was challenging: a caveat is basically a byte array for which you must design your own system to encode and test conditions.

For Biscuit, we chose a more general approach. We provide a logic language based on Datalog to write authorization policies. It can store data, like JWT, or small conditions like Macaroons, but it is also able to represent more complex rules like role-based access control, delegation, hierarchies. Those authorization policies can be carried by the token or provided on the verification side. They are encoded in a small binary format for transport. Additionally, it is fast to evaluate: generally, the entire process of checking the signature, deserializing the token and testing the authorization policies is done under 1 ms.

Example Datalog rule

With this language (that can be learned in minutes), you get a unified way of representing complex business rules, in a testable and portable format. You can explore how policies work in a simulate environment, even write unit tests for them, then deploy them as dry-run tests and see how they would react on real world requests. Instead of a binary allow/deny result, you can gain fine-grained info, and query structured data. As an example, a request to list files would be accepted if we have the rights for it, and we can also get the filtered list of files we can access, even taking into account the attenuation rules carried by the token.

Multiple rule systems can be combined, which is useful for the second problem, about the mismatch between an application's policies and its user's needs:

an application using GitHub or Twitter OAuth and requesting too many rights because to get a subset of rights like read access to a repository, you get it for all repositories
a SaaS application or hosting company for which all users from one client share one account
or roles and groups that do not match work segmentation for users

Traditionally, this is solved in two ways:

the service includes more and more complicated authorization policies and the user management panel becomes a complicated mess
it connects itself to external authorization systems, like Active Directory or Keycloak, and let the user manage them

With Biscuit, there's another way. Authorization policies can be provided by the verification service, but they can also be carried by the token. The service can specify its policies, and the user can attenuate tokens with their own policies. And they will all be evaluated in the same way, while guaranteeing that the token cannot get more rights with user policies. So from an initial token, an entire parallel authorization design can be developed that will still be compatible with the original one.

You can also take an existing token, and restrict its access to a minimal set of resources, like you would need for your CI/CD systems. There's a lot of new patterns that will become possible with Biscuit, and we'll have to explore it more in the future. Right now, let's look at an existing use case.

Example

At Clever Cloud, we are heavy users of Apache Pulsar. To provide this service to our users, we needed a flexible way to make it multitenant. By integrating Biscuit as an authorization plugin, using the Java implementation of Biscuit, we can provide a separate namespace for each user, but that token has full rights on that namespace. From there, the token can be attenuated to new tokens with various policies:

limiting access to a topic name prefix
allowing subscription on only one topic
allowing message production on only one topic
adding an expiration time

The authorization plugin only needs to check the token's initial rights to the namespace, and verify that the request matches the various checks added in attenuation.

As an example, we use that internally for a remote administration agent. Each new instance of the agent gets a new token derived from the original one, restricted to listening on its own topic (the topic name is a UUID). Then, when it receives a message, it also gets a short-lived token that can be used to send answers to a single temporary topic.

What's next

The next article will dive into how to write policies and how to integrate it into your application. You can already test that language in the online playground.

You can start using Biscuit right now in Rust, Java and Go.

The Rust version can also generate C bindings, currently used to develop a Haskell version, and there is a WebAssembly wrapper.

As an example integration, you can check out a Biscuit based authorization plugin for Apache Pulsar.

The specification is developed in the open, you can contribute.

We are just at the beginning of this exciting new technology, so we are still learning how to use it, exploring new design and authorization patterns. I can't wait to see the fun applications you will come up with Biscuit!

In Defense of Optimization Work

Geoffroy Couprie — Thu, 27 Sep 2018 17:15:00 +0000

It is common knowledge that hardware is cheap, and programmers are expensive, and that most performance issues can be easily solved by throwing more and bigger hardware at it. But is it really cheaper in the long run? Is there still some room for optimization work?

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."

Donald Knuth, [or Tony Hoare according to Donald Knuth, or Edsger W. Dijkstra according to Tony Hoare](https://shreevatsa.wordpress.com/2008/05/16/premature-optimization-is-the-root-of-all-evil/)

Yes, we engineers like to optimize, spend time shaving off bytes and microseconds from our systems. And it is often at odds with the requirement to build features and squash bugs instead. Before we get into the cost comparison of optimization work VS hardware, allow me to reframe the issue: performance is not a goal isolated from other engineering business issues. Like security, it is a transversal problem:

performance is a reliability issue, because it will trigger bigger and more frequent production issues
performance is a user experience issue, it affects directly how your users will work with your systems
performance is a cost issue

And like security, it is not something we can ignore on a whim, it is part of the job.

Moreover, while we hear about hero stories of optimizations involving kernel ork or writing assembly manually, most performance work is actually quite simple, and boils down to tasks like:

adding a cache
adding database indexes
doing some work inside a database query instead of loading all the data and analyzing manually

It is actually rare (in web services) to have some optimization work that requires someone to write assembly manually or other fun tasks like this.

The cost of not optimizing code

If we take $100,000 as average programmer salary (in the US), it comes down to around $400 per day, so $50 per hour (20-21 days per month, 8 hours per day). So spending an afternoon optimizing code would amount to $200.

Using bigger or more hardware looks like a small cost comparatively. Let’s assume adding another machine would cost $20 to $50 per month. Let’s choose $20/month. The naive calculation would show that we recoup the costs of optimization work after 10 months. So at this point, thinking 10 months in advance might not be too interesting, and the cost of the machine is not too high. But after those 10 months, it starts costing more than having an engineer look at it.

Because here is the first difference: an optimization task is a fixed cost, done once, while the hardware cost is compounded (if the hardware is bought instead of rented, it will be amortized until the date of replacement). We easily trick ourselves into comparing the local, monthly cost, without looking at it on the long term.

But there is another aspect of the problem: performance issues are linked to business growth. Often not directly, but through some usage metric, like a number of messages sent, or a number of searches per second. Those grow with the number of customers. But they also grow with customer usage: if everything goes well, customers will use the product more and more. Do not expect performance issues to follow business growth linearly.

Let's take as example one of the metrics to evaluate a startup's growth: the growth rate should be around 5-7% per week

Let's use the number of users as growth metric, and ignore usage growth.

Assuming that one node of the application is at full capacity with the current number of customers N, we add another node.

At 5% per week, we’re around 21% growth per month (21.550625% exactly, and that's calculated as a geometric progression). We will double the number of customers in just 15 weeks!

We can see the costs in the following graph. Red points represent the monthly cost, while blue points indicate how much we spent so far.

Red points: monthly cost — Blue points: what we've spent so far

So we would spend: 1 more machine for the first 4 months, then we need to add one more. We will reach 3N at week 24 (6 months). We already spent $200 more than one machine. So this engineering time would pay for itself in 6 months, not 10. At week 40, we have already spent $560. Next week we will add another machine, because we will reach 7N.

After a year, we reached 12N, and paid $1140. We likely got other performance issues linked to the number of machines running, on-call issues, time spent updating them, etc. (But for all of those issues, we have Clever Cloud!) So we probably have more than 12 machines doing the work, and we spent considerable engineering time making them work.

Here's the equation: for a growth rate G, a number of months M, and a monthly cost C for one machine:

You can also test it in Wolfram Alpha (here for $20/month and growth at 5%)

To sum up, choosing the current settings as:

$200 of work (4h at $50/h)
$20/month for a machine
5% growth per week

We would spend $200 in new machines over the next 6 months, which is pretty short term. But there’s another way to look at it. We still need to fix the performance issue, but adding more hardware would buy us time. For the next 4 months, we would pay only $20 per month to delay the issue, and let our engineers work properly on it instead of putting out fires. Even better, now we have the means to plan hardware costs following business growth.

"Hardware is cheap, programmers are expensive;"

…but performance debt comes with interests (/¯–‿･)/¯

The author, posing with what seems to be a good book.

Spectre and Meltdown

Geoffroy Couprie — Thu, 04 Jan 2018 11:59:00 +0000

Yesterday two issues affecting CPUs have been released to the public.

TL;DR: the attacks are named Meltdown and Spectre. They allow reading the memory of the OS or of other processes, to steal secrets or get information for other exploits. A part of the solution can greatly affect performance of running code. In particular, this attack allows to easily cross container boundaries, and in some cases (not our case) even VM boundaries.

In addition to servers, consumer machines are affected, especially through browsers, so you should definitely update your operating system as well as your browsers.

What it means for Clever Cloud users

Your applications will be (or already have been) automatically restarted (just like any other maintenance deployments). The addons will be patched and restarted in place in the following hours. This will generate limited downtime on addons (usually around a minute, depending on the addon start up time).

In addition to restarting virtual machines, we will also need to restart physical machines, as the attacks theoretically allows VM boundaries crossing. This attack is not usable (yet?) on Clever Cloud due to our virtualization choices and our OS hardening, but we will deploy patches preemptively. Physical machines updates will take place in the following days and will not impact applications. We are currently working on finding the best solution for addons, but it will definitely incur additional downtime for addons.

The patches, while mitigating the issues, also come with performance regressions. It heavily depends on the workload as well as the exact CPU model. The CPUs we use are among the less affected by the performance issues, but a slowdown of at least 5% is to be expected.

Technical details

The Meltdown attack and the Spectre categories of attack are related to a performance feature of modern processors: branch prediction and speculative execution. Meltdown shows that when an instruction can cause a trap, like the privilege check for user → kernel access), the processor will perform speculative execution: it starts executing the code in case there’s no trap, but rollbacks if there was a trap. This attack happens at the boundary between user code and kernel. Before the processor has completely checked that we have the authorization to run privileged code, it starts executing it. When it turns out we were not authorized, it rolls back the results of that code, but not completely, it can leave some data in the cache. Combined with a technique called “cache timing attack”, it is then possible to guess the content of the data that was loaded in cache, bit by bit. Branch prediction has a related behaviour: when encountering a branch (example: an if/else expression), the processor will start executing one of the branches before it calculates the condition, to avoid waiting too much. It guesses which side of the condition is most likely thanks to its branch predictor. Spectre uses branch prediction to cause speculative execution to read out of a buffer’s bounds (among other consequences) in the kernel or another process, then guess the results from the cache.

The Meltdown attack is specific to Intel processors, it allows reading from the OS’s memory. There are patches available (the kPTI feature, also named KAISER https://lkml.org/lkml/2017/12/4/709). Those patches have a great impact on syscall performance (https://www.phoronix.com/scan.php?page=article&item=linux-415-x86pti&num=1), with programs running 5% to 30% slower depending on the workload. The Intel Haswell processors with the PCID (Process Context Identifiers) feature get the lowest performance hit (5%). We use those processors on Clever Cloud.

Spectre affects processors from Intel, AMD and ARM, it allows reading from the memory of other processes. It looks more like a new attack category, for which we will have to fix the issue individually in each affected software. The only global solution for Spectre is a radical change in processor architecture, and this is unlikely to happen soon. We will follow closely any new related vulnerability and promptly patch our infrastructure.

For further information

Papers and explanations about Meltdown and Spectre: https://spectreattack.com/
Proofs of concept from Google’s Project Zero team: https://googleprojectzero.blogspot.fr/2018/01/reading-privileged-memory-with-side.html
French twitter thread explaining the attacks: https://twitter.com/fenarinarsa/status/948697105996156928
English twitter thread explaining the attacks: https://twitter.com/nicoleperlroth/status/948684376249962496

This post has been written by @gcouprie and @clementd.
Spectre and meltdown logos of are designed by Natascha Eibl.

Hot reloading configuration: why and how?

Geoffroy Couprie — Mon, 24 Jul 2017 14:25:00 +0000

At Clever Cloud, we are working on Sōzu, an HTTP reverse proxy that can change its configuration at runtime, without restarting the process. Why would we need that, you might ask?

In our architecture, all the applications sit behind a few load balancers with public IP addresses to which our clients point their DNS entries. We used HAProxy for these load balancers, a well known HTTP reverse proxy. Like a lot of other companies, our solution to serve a lot of applications, each with many backend servers, is to generate a configuration file for HAProxy and ask the current process to spawn new workers to handle the new configuration.

Unfortunately, this approach can lose new TCP connections on configuration updates: the old workers might stop accepting connections, but their listen queue might still contain new connections that will not be transferred to the new workers. Worse: if you change the configuration again while the old workers have not stopped handling their current connections, they would be killed, and the live connections would stop right there. Or the processes could just pile up and hog resources.

We use an immutable infrastructure approach. For every new version of an application, instead of modifying the backend servers in place, we spawn new ones. This means that we must update the HAProxy configuration to route an application to the new backend servers. And so, we must update the global configuration on every new commit from any of our clients. With multiple configuration changes every second, we're bound to lose some connections.

Hot configuration reloading VS restarting processes

That's why we set out to build a new reverse proxy that could handle configuration changes without losing connections.

First, we decided configuration updates should happen without restarts. The ability to change the configuration of a running process is essential because launching new workers, transferring the current state between them, and removing old workers gracefully, is a task with a high impact on server load and latency. You're essentially doubling the server's resource usage, and launching a lot of new processes every time.

This is a cost we have to pay for executable upgrades, though, but those happen less frequently. For that case, we provide ways to do upgrades without downtimes, either automatically, or with more hand holding (you decide when to launch each step) if you want.

Shared and synchronized configuration VS data locality

To handle configuration changes, Sōzu uses a master/workers architecture, with the master receiving configuration changes and transmitting it to clients. Each worker has its own copy of the configuration: if workers had something like a shared memory segment for this, we would need to add cross process synchronization, and make sure that accessing this data is safe, fiddle with pointers, etc. The whole configuration state is not very large (certificates and keys are the biggest part), so we can keep copies in every process. This makes for better data locality and is easier to handle overall.

Configuration changes: push VS pull

To get configuration updates, there are basically two solutions:

The proxy polls new configuration from files or from tools like Kubernetes, etcd, Consul...
The proxy exposes a communication channel to get the new configuration

The first solution is essentially what Traefik, the reverse proxy built in Go, does. We chose the second solution because we thought it is not the proxy's reponsibility to communicate with those tools. After all, it is just a pipe, it should not have to understand all the existing configuration protocols. So we expose a channel, and we build tools to bridge between the configuration backend and Sōzu.

That way, we do not impose the configuration format, the proxy binary stays small, and anyone can write their own tool to drive it, in any language they want.

The channel is a Unix socket. We decided on this instead of exposing a TCP port on localhost because anybody on the machine could connect to it, while a Unix socket has its access protected by filesystem rights.

The protocol is quite simple: JSON objects representing the proxy orders and their answers, separated by null characters. Writing new tools is as easy as writing directly to a file, you can even make tools in bash. If you want more control, we provide libraries in Rust to wrap this channel. Other libraries will appear soon to do it easily from other languages.

Working with configuration diffs VS replacing the configuration

An easy way to implement runtime configuration changes might be to replace the whole configuration every time there's a change, and start handling the new routing right away. We choose another way: Sōzu works with configuration diffs. The messages you send through the Unix socket contain information like "add this specific certificate", "add this backend server to this application", "remove this domain name for this application". This is useful because when you replace the whole configuration at once, you lose information.

You might have to remove some openssl contexts still storing old certificates. Or you might want to know when you can drop a backend server: if you tell Sōzu to remove a backend server for an application, it will first tell you that it acknowledges the change and will stop routing new traffic to this server, but will also tell you if there are still connections going on to this server. It will notify you once the connections are gone, so you can safely drop that server.

This also means configuration changes are smaller: instead of loading a complete configuration, you just send a few small messages.

To accomodate for this solution, the configuration protocol is more than request-response: there are 3 possible answers: Error, Ok or Processing. If the proxy answers Processing, that means the actual result might come later. It can send other Processing messages, to keep you posted on the number of current connections or other issues, and send you Ok or Error after a while.

Design goals for a tool with hot reconfiguration

We worked for a while on this and had time to explore the requirements for a tool with hot reconfiguration. There are three important points:

You have to bake runtime reconfiguration in from the beginning. You cannot retrofit that correctly in an actual system by messing with backend config switches or other hacks
Do runtime reconfiguration, not process restarts. You might get away with it or blue/green deployments if your configuration does not change often and connections are short lived. Our experience shows it does not happen like this
Work with configuration diffs instead of replacing the configuration. Otherwise, you're losing important information for your infrastructure

We hope this architecture will make it easily to make long lived and reliable systems, and we hope people will build awesome tools around Sōzu.

Async, Futures, AMQP, pick three

Geoffroy Couprie — Tue, 28 Mar 2017 19:34:00 +0000

A few weeks ago, we set out to develop an AMQP client library in Rust, and I'm happy to release it now! We will integrate it in more and more of our tools in the future.

Design: a futures based API and a low level API

One of our goals was to leverage tokio and futures to make an API that is easy to use, but also allowing for lower level implementations using directly an event loop with something like mio.

This was a bit challenging, but we ended up with two crates:

lapin-async, the low level library
lapin-futures wrapping lapin-async in futures

The resulting code can work with tokio-core's event reactor, or even futures-cpupool.

The libraries use, for the network frame format: nom, the Rust parser combinators library; cookie-factory, the experimental serialization library with the same approach as nom. It is a great example of employing nom inside a tokio transport, and integrating a complex protocol's state machine directly with tokio-io. We will release a tutorial on how to write such a protocol soon.

The libraries are also designed to be completely independent from the network stream: you can use a basic TCP stream, a TLS stream or a unix socket, and you won't be blocked by rust-openssl version conflicts between many libraries (which was a big issue for us).

Using the futures API: publishing a message

Every method returns a future, to let you chain them: the connect result will give a correct client once the complete AMQP handshake was performed, the channel will be available once the server has answered, etc. But the nature of AMQP makes parallel work on the same connection easy.

extern crate futures;
extern crate tokio_core;
extern crate lapin_futures as lapin;

use std::default::Default;
use futures::Stream;
use futures::future::Future;
use tokio_core::reactor::Core;
use tokio_core::net::TcpStream;
use lapin::client::ConnectionOptions;
use lapin::channel::{BasicPublishOptions,QueueDeclareOptions};

fn main() {

  // create the reactor
  let mut core = Core::new().unwrap();
  let handle = core.handle();
  let addr = "127.0.0.1:5672".parse().unwrap();

  core.run(

    TcpStream::connect(&addr, &handle).and_then(|stream| {

      // connect() returns a future of an AMQP Client
      // that resolves once the handshake is done
      lapin::client::Client::connect(
        stream,
        &ConnectionOptions{
          username: "guest",
          password: "guest",
          ..Default::default()
        }
      )
    }).and_then(|client| {

      // create_channel returns a future that is resolved
      // once the channel is successfully created
      client.create_channel()
    }).and_then(|channel| {
      let id = channel.id;
      info!("created channel with id: {}", id);

      channel.queue_declare("hello", &QueueDeclareOptions::default()).and_then(move |_| {
        info!("channel {} declared queue {}", id, "hello");

        channel.basic_publish(
          "hello",
          b"hello from tokio",
          &BasicPublishOptions::default(),
          BasicProperties::default().with_user_id("guest".to_string()).with_reply_to("foobar".to_string())
        )
      })
    })
  ).unwrap();
}

Every struct of the API, be it a client, channel or consumer, holds a synchronized reference to the underlying transport, so you could call it from any thread.

Using the futures API: creating a consumer

When you call the basic_consume method, it returns a future of a Consumer. It implements Stream, so this can reuse all the related combinators from the futures library.

extern crate futures;
extern crate tokio_core;
extern crate lapin_futures as lapin;

use futures::Stream;
use futures::future::Future;
use tokio_core::reactor::Core;
use tokio_core::net::TcpStream;
use lapin::client::ConnectionOptions;
use lapin::channel::{BasicConsumeOptions,QueueDeclareOptions};

fn main() {

  // create the reactor
  let mut core = Core::new().unwrap();
  let handle = core.handle();
  let addr = "127.0.0.1:5672".parse().unwrap();

  core.run(

    TcpStream::connect(&addr, &handle).and_then(|stream| {
      lapin::client::Client::connect(stream, &ConnectionOptions::default())
    }).and_then(|client| {

      client.create_channel()
    }).and_then(|channel| {

      let id = channel.id;
      info!("created channel with id: {}", id);

      let ch = channel.clone();
      channel.queue_declare("hello", &QueueDeclareOptions::default()).and_then(move |_| {
        info!("channel {} declared queue {}", id, "hello");

        channel.basic_consume("hello", "my_consumer", &BasicConsumeOptions::default())
      }).and_then(|stream| {
        info!("got consumer stream");

        stream.for_each(|message| {
          debug!("got message: {:?}", message);
          info!("decoded message: {:?}", std::str::from_utf8(&message.data).unwrap());

          ch.basic_ack(message.delivery_tag);
          Ok(())
        })
      })
    })
  ).unwrap();
}

Looking under the hood: lapin-async

The lapin-async library is meant for use with an event loop that will tell you when you can read or write on the underlying stream. As such, it does not own the network stream, nor the buffers used to read and write. You handle your IO, then pass the buffers to the protocol's state machine. It will update its state, tell you how much data it consumed, give you data to send to the network. And then you can query it for state changes.

There are various reasons for an architecture like this one:

a library that owns the IO stream usually does not play well with event loops
the developer might want to make their own optimizations with sockets and buffers
separating the IO makes the library easy to test: you can pass buffers (or even complete structs) to the state machine and verify the expected state easily

More generally, a protocol library should not dictate how the application handles its networking.

As an example of how it could run:

let mut stream = TcpStream::connect("127.0.0.1:5672").unwrap();
stream.set_nonblocking(true);

let capacity = 8192;
let mut send_buffer    = Buffer::with_capacity(capacity as usize);
let mut receive_buffer = Buffer::with_capacity(capacity as usize);

let mut conn: Connection = Connection::new();
assert_eq!(conn.connect().unwrap(), ConnectionState::Connecting(ConnectingState::SentProtocolHeader));
loop {
  match conn.run(&mut stream, &mut send_buffer, &mut receive_buffer) {
    Err(e) => panic!("could not connect: {:?}", e),
    Ok(ConnectionState::Connected) => break,
    Ok(state) => info!("now at state {:?}, continue", state),
  }
  thread::sleep(time::Duration::from_millis(100));
}
info!("CONNECTED");

the run method is a helper that will read from the network, parse frames, update internal state with the frames, write new frames to the network. We loop until the state switches to "connected". Most of the behaviour is on that model.

While the lapin-async library has most of the functionality, it is still a lot of manual work to manage, and you should prefer the futures based library.

A young library

This is an early release, and it is missing a lot of features, but the design makes them easy to implement.

Right now, the only authentication method is "plain", you can create and close channels, create queues (without options), and use the methods from the "basic" AMQP class. RabbitMQ's "publisher confirms" extension is also available.

It is mainly missing the "nack" extension, and the exchange and transaction handling methods.

More features will come in the following weeks, and if you want to contribute, you're very welcome :)

Falling for Rust

Geoffroy Couprie — Wed, 02 Nov 2016 16:02:00 +0000

If you ever talked to me, or looked at my Twitter feed, you may have noticed that I campaign loudly for the Rust programming language. I am not going to stop. In fact, it will only go crescendo!

I got involved with that language a while ago, when I was looking for a safe alternative to C, for my research for the Videolan project. It was an interesting time, since the language changed often. I basically had to rewrite most of my code every two weeks. But since the release of Rust 1.0 in May 2015, it has gone from being an experimental tool to a serious contender for other system programming platforms.

I'm no longer the only one talking about Rust in the office, and the funny thing is that I'm not the first one to put some Rust in production! Marc-Antoine started rewriting some management tools from bash to Rust, and they were working so well that they quickly ended up running on the platform.

Why we are betting on Rust for the future

We have always used heterogeneous technologies to manage the platform, mainly because we must know how they behave to host them properly. We test and use everything, and running them on our platform gives use the long term view.

For us, Rust gave substantial benefits from the beginning, and showed it is adapted to the way we run production code.

Static binaries

Like C, C++ and Go, Rust can build single file binaries that you will upload on a server and run directly, without any installed dependencies except a libc. This is significant, because when you want to reduce the base disk size and the boot time of a virtual machine, installing a lot of dependencies, like a Python virtual machine and its standard library, quickly takes its toll.

We built an immutable infrastructure: we do not modify virtual machines directly, instead we modify the base image and start new machines from scratch. So we're not too concerned about updating a dynamic library separately from the executable loading it.

Reliability

Memory safety is one of the biggest arguments of Rust, but it goes further than that. There's a big emphasis on providing safe patterns everywhere. As an example, data is immutable by default, you need to explicitely add "mut" to a variable to be able to change it. Most APIs will return an "Option" or "Result" type to safely wrap an actual result or an error, and there are lots of ways to manipulate them easily. Ignoring an error means explicitely adding an "unwrap()" call to that function's result.

All this means that by default, you write Rust code that should not crash, and that should not modify data unless explicitely stated. Add in a good type system to represent correctly and check assumptions. And some unit tests, because we know type systems don't replace them (although you don't need to write as many unit tests as in languages with simpler type systems).

You then get code that will rarely fail, and in which you can easily find the parts that break assumptions (example: grep for unsafe and unwrap). There will still be bugs, but you will work more on functional parts than on plumbing.

Stability

Rust has no garbage collection. Most of the time, you won't care about this, because the benefit of using a language with a garbage collector outweights any of the usual concerns, like performance. Not caring about memory allocation lets you write code without fear of most memory vulnerabilities, and avoid most instances of memory leaks (sadly, not all of them). The real issue in production: the garbage collector disturbs the program's behaviour by regularly executing a task that goes through the memory to detect what to deallocate. Of course, it's not always as simple as a mark and sweep, but GC will still take some CPU time and introduce latency in other tasks. In situations of intense memory pressures (like, a lot of requests to handle), keeping lots of unusable memory around or stopping other tasks can trigger catastrophic failures. When the issue appears, people will typically spend a lot of time tweaking GC settings or rewriting code to avoid it. Either way, it's a lot of work.

Rust avoids garbage collection by handling memory allocation precisely at compile time. You get the same benefits without the runtime cost. The result? an application with predictable CPU and RAM usage. Typically, the RAM usage graphs will be flat, compared to the sawtooth graphs with garbage collectors. Predictability is a key feature for stable production systems: you can make more assumptions about the runtime behaviour and plan for resource usage.

On that point, you get another benefit: it is easy to make boundaries on resource usage. If I know roughly how much memory I need for X concurrent requests, and if I know the capacity of my server, I can put a soft limit on the number of concurrent requests, at which I set up an alert, and a hard limit, because I know that past that limit, the server will just stop answering properly.

Not worrying about it

For us, these benefits make a strong case for Rust as a reliable building block for a production platform. This is the piece of code we don't have to worry about, and it will enable others to run safely.

You can do it, too!

Right now, we are replacing small parts of the infrastructure, and we will soon unveil more interesting tools. In the meantime, we added Rust support in beta on our platform: now you can deploy Rust web applications in a few commands.

Go ahead and put Rust in production!

Let your logs help you

Geoffroy Couprie — Mon, 23 May 2016 14:12:00 +0000

We use logs for everything, to track errors, measure performance, keep a journal of how our software runs, or even debug code in production. Since we use it so much, we should be good at it by now, right?

Surprisingly, well written, useful logs are not frequent. The norm is logs full of garbage, old debug statements ("I was here"), unhandled exceptions, and non actionable information.

You read the file, looking for that one word, that should be followed by another word, but two threads are performing the same task and each will output the two log messages. Maybe, if you look at the timing, you'll know which is which.

You try to make sense of multiple interleaving messages, but they come from two machines with desynchronized clocks, and one of them is sending batches of messages every second instead of sending them as they are generated. You wonder if the code is executing in the right order and write essays about causality and determinism.

I have a hundred stories like those, and there's a common theme, a root cause for the way we write logs. As a developer, we think of them first as a developer's tool, not as the main interface to check your app's health in production. They are meant to be read with tail while the application runs on small workloads, for a limited time.

Developer logs VS ops logs

Managing software in production means detecting when something is wrong, but knowing every bit of the program's state is a counterproductive way of doing it.

You want to know the important metrics:

someone consulted a web page: leave that to your web analytics software
someone consulted a web page, but the app failed to answer: log it
someone consulted a web page, and the page's size is larger than the buffer: don't log it, it's useless most of the time
someone bought something: log it, even if you have a backend to see that
someone bought something and the server failed: log it and send an email to the dev team to fix it right now

More generally, here is a scale from ops log to developer log:

a transaction has been performed, successfully or not => ops logs
individual steps of that transaction failed => ops logs
individual steps of that transaction succeeded => dev logs
value of variable X at point Y in code => definitely developer logs

Here is an example (from code I wrote) of developer centric logs:

I am not saying developer info should not appear in logs. Just that it is best hidden under "debug" or "trace" levels, if your logging system supports levels (most do). It is fine to have useless messages in there, as long as they are not used in production/ It is fine to use "printf debugging" in development, but it should never appear in production.

Your goal in writing logs is to spare time for the person that will read them. It may be a sysadmin in your company, a client using your software, or yourself, six months from now, trying to put the app back online at 2 AM. Please think of your future self.

Making the logs more ops friendly

To make the life of the journal's reader easier, you need to optimize for two reading engines:

filtering software, like grep
the human eye

Why the human eye?

Because we are good at detecting patterns, and filtering out the useless parts of an image. If a sequence of three lines appears regularly, we will see it easily. If we only care about the message part of the log, not the prefix (time, PID, etc), we will focus our attention on it and ignore the rest.

The consequence is that everything that will break the brain's flow will make your logs harder to read. if the user id appears at the beginning of one message, but at the end of another one, it will be much harder to see which messages are related.

Fortunately, what works well for the human eye also works for filtering tools. Standardized, common prefixes are easy to search for, and easy to recognize. Related information should have identifying information, always stored in the same place.

Practical advice

How can you apply those principles right now? You begin by making a small wrapper over your logging library to automatically insert useful information, in a fixed format:

a timestamp, preferably in ISO8601 (easier to read) at UTC (no timezone conversion when reading). By the way, make sure your servers are all set up to use the same timezone, this will save headaches
a timestamp from a monotonic clock if your application is time critical
identify the current instance: add a server identifier (name, IP, whatever) and an instance identifier (process id, thread id)
the running code's version (commit id or version number)
the file's name, line number, class and function names are useful for debugging, so add them for the "debug" and "trace" levels (but don't activate those levels in production unless you have a good reason)
add some correlation information: the user id, a request id, anything that will let you track which action resulted in which messages
code status: are we in the middle of an error? Is something pending?
then, at the end, you can put a written message. You can use structured logging instead of raw text if you want to track data with more automated tools

This is a lot of information to put on one line, but we have great tools at our disposal. We can filter on one of these fields and remove it from the output. We can use terminals larger than 80 characters. We can even color parts of those logs to let the eye separate them easily.

If I had to rewrite the previous example logs that way:

Side note: beware exceptions. It is fine to display an exception's stack trace while debugging the code, but an exception in production means two things:

you forgot to replace the stacktrace with a proper error message for an exception you handled
there's an exception that you do not handle in your code

In both cases, it is a signal that something wrong happened and that it should be fixed soon. Also, it usually messes up the log's format, and fills up the log with useless information. I once heard about an app that needed to run on machines with big disks and big CPUs, because there were so many exceptions logged that the machine could not keep up.

Anyway, The goal of that approach is to have a common format for every message, simple to parse and filter. Make sure that those common parts have the same length, and that you use the same separators everywhere. Tabs are usually better than spaces, since we rarely use them in log messages. This will make the logs easier to read, and much easier to filter for.

With a good logging discipline, you will soon see non conforming messages as bugs, and you will be much more efficient when debugging and operating your application.

Security is a process, not a reaction

Geoffroy Couprie — Mon, 04 Apr 2016 15:32:00 +0000

Wake up. Check the news. There is a new OpenSSL vulnerability, the world is on fire. That vulnerability was published a week ago. Panic. Patch everything in a hurry. Break production. Panic^2. If this sounds familiar, you are probably running a web application of some kind. Maybe your whole business depends on it. Maybe you didn't hear about the latest world-on-fire vulnerability. Panic. How do you keep up with security issues when everything is happening so fast? Which parts of your technical stack are the most at risk? Is the customer data safe? Do you really need to care? At Clever Cloud, we support many languages and databases, running on hundreds of machines. And our core business is to execute code we didn't write, on our infrastructure. This has an interesting effect on security management: there is always an issue somewhere. Vulnerabilities appear every day. You are lucky if they are not "0 day vulnerabilities". Those are flaws published without notifying the developers. This means there is no solution available at publication time. How do we handle our security calmly when we should actually run around screaming? Our approach to security comes from the way we run our systems. You cannot manage hundreds of machines without automation and well defined processes. Every action on our infrastructure must be cheap to perform, or have a great impact. People see security as a huge cost because of the work it implies:

unclear risk and impact on the business
time spent tracking new vulnerabilities for various applications
unclear result of updating code (will it stop working? Will it break other applications on the same machine?)

You want to reduce that cost, make security management easier and easier, until it is just a part of a day's job.

Defining your risk budget

Calculating the risk requires some time at first, to teach your team how a threat model works, and how to update it. The threat model is a description of your system used to evaluate the cost of an attack:

targets: user data, intellectual property, machines
entry points: web server, internal WiFi
weaknesses: unpatched application, SQL injection, key employees victims of phishing

With this model, you calculate the difficulty of exploiting one weakness, which access level you obtain, where you can go from there. At the end, you get a list of issues in your system, ordered by impact on your system and ease of exploitation. Typically, if an automated script can steal your whole database, fix it immediately. That model is the baseline everybody will use to evaluate security issues. It makes the risk real, not something you can just handwave with saying "we can take that risk". It is something you can plan for and budget for.

Staying up to date with security news

Once you have a model, you need to keep it up to date with current news. Maybe requiring Java applets in your client's browsers is not such a good idea anymore. Maybe your advertisement network is now serving malware (as a side note, to drastically reduce malware infection at your company, install ad blockers everywhere, trust me on this). Following security news can look like a daunting task, but you can simplify it with good sources:

avoid news websites. They write long articles, they want you to panic and they rarely provide usable solutions
Follow security mailing lists. There are generalist ones, like oss-security@lists.openwall.com and cve-assign@mitre.org. There are more specific ones, like debian-security@lists.debian.org (translate to your specific distribution), or rubyonrails-security@googlegroups.com and ruby-security-ann@googlegroups.com. There is also fulldisclosure@seclists.org, where 0-day vulnerabilities are sometimes published
Twitter is still a good source of information on vulnerabilities, since people easily share. If you see security people suddenly buzzing in your timeline, you should pay attention. There are good lists of people to follow to get you started here and there. They each have their own focus, though, so you may not be interested in everything
keep up with new versions of your software and their dependencies. Use your package manager, project specific mailing lists, subscribe to their github feed

Tracking security news becomes a simple process:

check the mailing lists, see if you use any of the applications mentioned
check your dependencies: anything new? Any security issues mentioned?
check Twitter: is the world on fire?

Be careful, though. Twitter is often on fire, and security experts like to jump on the new vulnerability and dissect it at length. Even when there is no information available. Not every vulnerability needs attention right now, some of them may not even apply to your particular usage of the software. Don't panic (yet). Taking the time to verify security issues regularly makes security part of your daily/weekly process. Applying a security patch is just another item to raise at your morning stand up meeting (or whatever other process). Note that the person tracking the vulnerability might not be the one fixing it. When I first learned about the Logjam flaw, I was about to enter a plane for 10 hours. Notify the team by SMS/Slack, get an acknowledgment from someone, then go to sleep.

Reducing the risk of code updates

Here lies the huge cost of security: any code change in production is a potential liability. It brings no value to the customer, can introduce bugs or even crash the whole system (please make backups and test them regularly). But this cost is not limited to security. It applies to your whole business. If modifying the production environment is complex and error prone, bugfixes come rarely. New versions come in huge chunks of code that will break things. Huge list of changes may even require some service downtime. The point of our job at Clever Cloud is to make new deployments fast and painless. It has influenced our whole approaches to security. If you can start and remove a new instance of your application in seconds, you get huge benefits:

staging environments to test updates
replacing huge, risky updates with small increments
applications can be completely independent. Updating the company's WordPress blog will not affect the SaaS application

This is how we do code updates now: when a project's dependency gets a new version to fix a security issue, just redeploy the application. When there's a security patch for the Linux kernel, apply the patch, redeploy all the virtual machines, move on.

We do not run around with our hair on fire. It is just a basic loop of:

get notified of a vulnerability
see if it applies
see if there's a patch (or if you can develop one quickly)
apply the patch
redeploy the applications
go make yourself a nice tea

We have good examples of this:

The recent CVE-2016-0728 is a privilege escalation in Linux, something we need to take seriously. We took a look at the advisory, wrote a patch, tested it and deployed it in a few hours. Most Linux distributions took days to publish updated packages.
In the same way, the infamous Heartbleed bug was fixed quickly. One of our clients came to us hours later asking if we knew about it: "oh, that's the reason my applications were redeployed in the middle of the night"

When deploying new versions of an application is easy, it suddenly reduces the cost of code changes. The operational risk gets tiny, compared to the security risk. And you can update everything fast. You have no more excuse to keep unpatched systems. Following those tips to set up your security process will improve your operations as well. With a systematic approach, you know your application better, you can see the cost of managing issues and take action. There is still a lot to talk about, like training for incidents, defining operations procedures, or how to set up your infrastructure for easy deployments. But that last item, we can handle it for you right now.

nom 1.0 is here! REJOICE!

Geoffroy Couprie — Mon, 16 Nov 2015 14:35:00 +0000

nom is a parser combinators library witten in Rust that I started about a year ago. Its goal is to let you write parsers that are safe by default, fast, and abstract all of the dangerous or annoying details of data consumption.

During that year, more than 50 projects have started using it; from toy parsers to high performance production code. Their feedback has been invaluable to improve the library, include more and more parsing patterns, and test ideas on what makes a great parser library. The 1.0 version is the result of that feedback. More stable version, but also a few breaking changes to improve the architecture, make it more flexible and easier to use. We now feel it is reliable enough to be used in production at Clever Cloud. We have a lot of data to manage, coming from trusted and untrusted sources, and this is exactly the kind of tool we need to build a safe infrastructure.

The quantity of open source projects using nom has been really helpful in developing that stable release. If you maintain one of those projects, you may have received a pull request from me. That's right: I took care of testing the 1.0 branch on every project I could get my hands on, to see what would break, which features developers were using, and document the upgrade process. This has been a lot of work, but worth it. I'll probably tell more about that in a future blog post, for other library maintainers that want to try the approach.

That's all good, but why would you use nom right now? Let's see!

nom is fast. How fast? A few benchmarks have shown that it is consistently faster than Parsec and attoparsec (Haskell parser combinator libraries), faster than other Rust parser combinator libraries, and even faster than Rust's regular expression library. There is even a benchmark where it beats Joyent's http-parser on parsing HTTP request headers.

Why is it faster? I have a few ideas about this. First, unlike most parser combinators systems, nom does not copy data if it is not needed. It uses the slice heavily, a Rust data structure containing a pointer and a length. Since Rust's compiler manages memory correctly, you can afford to refer to the original input from the beginning to the end of the parser, without copying anything.

Second, nom does not chain parsers at runtime. The macros directly generate the parsing code at compile time. This creates very linear code, something that CPUs find very easy to handle. If you tried to decompile the final binary to C code, you would just see a long list of if-else branches.

It is also a safe alternative to handwritten C parsers. nom bases its memory safety on Rust's compiler: it knows, at any moment, which part of the code owns which part of the memory, prevents out of bound accesses, automatically manages memory allocation and deallocation. And since that is not enough, some nom parsers were fuzzed to hell with American Fuzzy Lop, just to verify those claims.

The result? The only flaws that were found appeared, not in nom generated code, but in code written manually outside of nom: index calculations that could overflow if a specific value appeared in the input. And those could not result in memory corruption, just crashes.

you can quickly write a parser that will be safe by default

This has awesome implications: you can quickly write a parser that will be safe by default. This lets you test ideas, experiment with your design, without fear for your security.

You should now see where I'm going: with parsers that are easy to write, as fast or faster than handwritten C, and safe by default; you can replace old and vulnerable C parsers. Rust can work without a runtime, and is easily embedded in C code. It has already been used to write extensions for Ruby, Python, NodeJS and others. It is only a matter of time until it replaces the vulnerable parts of current C projects.

This is one of my long term goals: making reliable, safe building blocks to build our systems. Not only new bricks, but also replacing the old ones. This will require a tremendous effort, and nom is just the first step, but a big one.

To get started using nom, you can include it in your Rust projects from crates.io. Here are a few links you will find useful:

Github repository Geal/nom
Reference documentation
Upgrading to nom 1.0
Gitter chat room. You can also go to the #nom IRC channel on irc.mozilla.org, or ping 'geal' on Mozilla, Freenode, Geeknode or oftc IRC
Tutorial about parsing ISO8601 dates
Making a new parser from scratch (general tips on writing a parser and code architecture)
How to handle parser errors
How nom's macro combinators work

Also, if you have existing code running older versions of nom, please take a look at the upgrade documentation