Seven Lessons from Building a Large, Distributed System

  1. The network is not your friend.  Machines will lose connectivity.  Sometimes it will be a permanent loss, other times temporary.  The long tail of packet delivery time is merciless.  Latency will temporarily spike for minutes sometimes, tricking you into thinking that your machines have lost connectivity.  It is very hard to tell the difference between a loss of connectivity and temporary spike in latency.

  2. Everything will fail.  In a distributed system, with an unreliable network, each and every component, and probably each and every line of code, will inevitably fail.  What happens when your machines lose power?  Lose Network connectivity?  A disk dies?  Another service fails?  Run out memory?  Byzantine failures?  (They happen!)  Cascading failures?  Race conditions?

  3. Dynamic typing will slowly become a nightmare.  As your code base grows into the 10’s of thousands of LOCs, how will you verify that your functions always are called with appropriately types parameters?  If you have a service oriented architecture, how do you make sure that only appropriately types parameters are passed into services with any service may have a byzantine failure?  If a race condition that only presents itself monthly in a package you depend on modifies the type of one of your objects, how does your system respond?

  4. Reasoning about time will become very difficult.  Causality is very difficult to establish when you cannot establish an order of events.  Did the nfs server crash or did worker pool first?  What caused what?  Check the logs and see the same timestamps everywhere.

  5. Refactoring a large, service-oriented system is hard. Even with industry standard abstractions, changing the interface to a service is very hard.  If you make a change to one interface, how long will it take to update the entire codebase?

  6. The only real test is production.  However great your test coverage is, no test is like the real test of production.  In production, things happen that you can’t even conceive of beforehand.  Mistakes will be made.  Incorrect configs and broken code will inevitably get pushed.  Users will find every race condition.

  7. Do not aim for 100%.  No distributed system will work 100% of the time.  Proof: earth hit by asteroid.  Now, the best you can do it pick a target.  Mine is 99%; i want my system to fail 1 in 100 times in production.  For a commercial system, you can select sometime higher: 1 failure in 1000 or 1 failure in 100,000.

     

Functional C Programming (2): The Garbage Collector

TL;DR

tl;dr With mechanisms to track and free pointers, we can implement a simple garbage tracker / collector for functionalC.  Garbage collection has already been added to lists, closures, and lifted types.

This is the first part of a series on implementing the garbage collector.

disgusted-mother-of-god

Background

After posting the first version of functionalC to github, there were quite a few complaints about a lack of memory management / garbage collection.  So, I added a garbage tracker to functionalC.

I write tracker, rather than collector, because our simple system will just track allocated objects and free them when instructed to.  It does not (yet) do reference counting and is blissfully unaware of the program’s execution.  (More of that for part 2.)

Implementing the Tracker – A List of References

The first step in implementing our garbage tracker is keeping track of all allocated memory.  To do this, we need a few things:

  1. An enumeration of the types in our type system
  2. A struct that tracks a pointer and its type
  3. A mechanism to store a collection of these structs

From the code:

//we can track all different types and their associated destructors
enum TYPE {
  LIST,
  ENVOBJ,  
  CLOSURE,
  STANDARD //gc's an generic obj (void *)  
};

struct ref {
  void *ptr; //pointer to the obj 
  TYPE type; //obj type
  ref *next; //refs are stored as a list;
};

Our TYPE enum works for 1. and our ref struct for 2.  We will store groups of references as lists, hence the ref *next field.

Now, putting these together, the structure of our garbage tracker is simple: simply store a list of references for tracking.

struct gc {
  ref *marked;        //these will not be freed unless unmarked
  ref *last_marked;   //append should be O(1)
  ref *unmarked;      //these will be freed when gc_collect is called
  ref *last_unmarked; //we want append to be O(1)
  void (*destructor_table[TYPE_COUNT])(void *);  
    //TYPE_COUNT is number of all tracked types
};

Our unmarked list stores the references that are ready for collection.  When a user marks an object for safe keeping, it is moved over to a marked, safe list.  More on that later.

I have also added an optimization here: we keep track of the last element in the unmarked list so that appending to the unmarked list is O(1).  This substantially improves the performance of the collector over a list with O(n) appends.

Implementing the Collector – Destructors & Collection

Once we are tracking pointers and their associated types, we need to know how to destroy  an object of a certain type.  To do this, we maintain an array, indexed by type, of function pointers to destructors.

//remember this part of the gc struct?
//this table is a map of TYPES => destructors 
//void (*destructor_table[TYPE_COUNT])(void *);  

//we use this function to associate one of our types and its destructor
gc_register_destructor(ENVOBJ, envobj_free); 
gc_register_destructor(CLOSURE, closure_free); 
gc_register_destructor(LIST, list_free); 
gc_register_destructor(STANDARD, standard_free); 

//e.g. 
void standard_free(void *ptr) {   
  free(ptr); 
}

With a list of typed pointers and a mapping from types to destructors, our garbage collection routine is actually quite strait forward:

//iterate through our list of references, calling the appropriate
//destructor of each.
void
gc_collect(void) {
  ref *curr = _gc.unmarked;
  while (curr != NULL) {
    ref *next = curr->next;
    (*(_gc.destructor_table[curr->type]))(curr->ptr);
    free(curr);
    curr = next;   
  }
  _gc.unmarked = NULL;
}

Using the garbage collector

Using the garbage collector is simple.  Just register a pointer and its type with the collector.  Since malloc tracks the size of allocated pointers, we leave tracking of size to malloc.

void
gc_register(void *obj, TYPE type)

//an example from inside our list object
list *
newitem(void *v) {
  list *o = malloc(sizeof(list));
  if (o == NULL) {
    exit(1);
  }
  o->val = v;
  o->next = NULL;
  gc_register((void *)o, LIST);
  return o;
}

Easy Improvements – Better Destructors & Marking

The first major room for improvement in this collector is better destructors.  Right now, lots of overhead is created because lists are tracked element by element.  Improving this is not too hard and would significantly lower the overhead per tracked byte of memory.

Another feature that i am currently working is the ability to mark an object for safe keeping.  When an object is marked, it is still tracked by the garbage tracker, but it will not be freed until it is unmarked.  The interface for this should be simple:

void
gc_mark(void *obj)

void
gc_unmark(void *obj)

Conclusion

Putting everything together again:

int
main(int argc, char **argv) {
  //yep, added first version of garbage collector
  gc_init(); //initialize the garbage collector

  iter(map(range(0, 10), dbl,NULL), printint, NULL);
  iter(filter(range(0, 10), odd, NULL), printint, NULL); 

  //Darker magic?  Not really...
  closure *addtwo = bind(NULL, add, liftint(2));
  closure *addten = bind(NULL, add, liftint(10));

  printf("%d\n", *(int *)call(addtwo, liftint(3)));
  printf("%d\n", *(int *)call(addten, liftint(3)));

  //all together now, with pseudo types everywhere woopie!!!
  list *vars = liftlist(range(0, 10), sizeof(int));
  list *res = lmap(vars, addtwo);
  iter(res, printint, NULL);

  gc_print(); //show eveything currently in the garbage collector

  gc_collect(); //you can guess what this does

  gc_print(); //anything left?
  exit(0);
}

The garbage tracker is working, but a full fledged collector is a long way off.  The tracker is unaware of program execution and does not count references.  Users most manually invoke the collector.  Going forward, i will improve on these issues.

Lists, closures, and lifted types all have garbage collection built in.

Cheers!

Functional C Programming: Higher order functions, closures, lifted types…

TL;DR

tl;dr You can implement many functional programming techniques in C.  Map, filter, closures, and lifted types are all very possible.  Function pointers are worth understanding. Code.

template

Functional C

I was having a conversation with a lover of Haskell who lamented C’s lack of functional constructs.  Eager to show him that C does in fact support functional programming, i decided to write a small, functional C library.  I posted the code here and hacker news seems to have enjoyed it.  

The following is a brief explanation about how to go about building a functional C library.  It walks through the basics and leaves the implementation details to the code itself.  Here is a more technical README.

Function Pointers

The basis of writing functional C is an understanding of the function pointer.  A good rule of thumb in C is that everything is and/or has a memory location (think & operator).  Functions are no different; they are locations of sequences of instructions that reside in memory.  Since this is the case, they can passed into another function just like any other memory location.   Anyone who has used qsort or signals in C has used a function pointer before, to specify a comparison or a handler function, respectively.

e.g. the standard libraries’ qsort function takes a function pointer names compar:

void qsort ( void * base, 
             size_t num, 
             size_t size, 
             int ( * compar ) ( const void *, const void * ) );

Lists, Lists, Lists…

Once you can pass functions as values, the next step is picking a data structures to store sequences of things.  Lists are perfectly suited to this task. Haskell, ML, & Lisp all use lists.

Our list structure is pretty simple:

struct list {
  void *val;
  list *next;
};

Map, Filter & Other High Order Functions

With function pointers and lists, its time to define some simple higher order functions.

Map applies a function to a list of items, one by one, and returns the resulting list.

//yeah, it's map; args is an extra pointer that is passed as the val to each 
//invocation of fn.
list *
map(list *l, void *(*fn)(void *, void *), void *args)

Filter takes a list and returns a list of only those items that return true when passed into the function.

//yup, and filter too
list *
filter(list *l, bool (*fn)(void *, void *), void *args)

Closures

To close a function, one binds a function to environment that it is to be executed in.  E.g. if i have a function that takes in two integers, a and b, and returns their sum, i can bind 2 to the value b to get a new function, called addtwo, that adds two to any number i put into it.

To implement closures, we need a few things:

  1. A way to store environments and their variables
  2. A way to bind environments to functions to create closures
  3. A way to call a closure and execute its function within its environment

To store our closures, we use lists of environment variables that store each value along with its size in memory.  To store a closure, we’ll store a function pointer and an environment.

struct closure {
  void *(*fn)(list *);
  list *env;
};

struct envobj {
  void *val;
  ssize_t size;
};

Environments store variables with their size metadata in order to permit polymorphic C functions.  I’ll write more about this later.  To simplifying lifting a standard variable into an environment for binding to a function, i’ve written some helper functions:

//returns a lifted integer
envobj *
liftint(int a);

//this transforms a list into an environment
list *
liftlist(list *l, ssize_t s)

These lifted types can be bound a function that takes a list of arguments as its input via the bind function:

closure *
bind(closure *c, void *(*fn)(list *), envobj *env);

Finally, with a closure obtained, we can call the closure in its environment with the call function:

void *
call(closure *c, envobj *env);

Conclusion

Putting everything all together now:

int
main(int argc, char **argv) {
  //LET THE MEMORY LEAKING BEGIN!!!
  //range returns a list of integers from start to finish
  iter(map(range(0, 10), dbl,NULL), printint, NULL);
  iter(filter(range(0, 10), odd, NULL), printint, NULL); 

  //Darker magic?  Not really...
  closure *addtwo = bind(NULL, add, liftint(2));
  closure *addten = bind(NULL, add, liftint(10));

  printf("%d\n", *(int *)call(addtwo, liftint(3)));
  printf("%d\n", *(int *)call(addten, liftint(3)));

  //all together now, with pseudo types everywhere woopie!!!
  list *vars = liftlist(range(0, 10), sizeof(int));
  list *res = lmap(vars, addtwo);
  iter(res, printint, NULL);

  exit(0);
}

Lambda functions and memory management are two things that i am working on.

As a disclaimer, i don’t feel that you should program this way.  This is the wrong way to do C programming.  However, it’s good to make sure that you understand function pointers.  They are useful.

 

Machine Learning: Building Image Based Search

tl;dr It’s not too bad to build a simple search engine where you put in a picture and get back both similar images and potential labels.  The distance between the gist descriptors of two images provides a useful way to determine their similarity.  Therefore, it is simple to build an image classifier from gists. Code.

super tl;dr  ThingyTaxiHeadCat

Computers can tell which ones of these are cats.

Image Based Search

For hack@UChicago’s winter (un)hackathon two friends (Falcon, Jeremy) and I decided to build an image based search engine.  Essentially, you upload a picture and we return two things:

  1. 10 most similar images
  2. Keywords labeling the image and their associated likelihoods

To build this search engine, you’ll need a couple of things:

  1. Some labelled data.  I have included the data we used here.  It comes from the tiny image dataset. (These are very noisy labels and the robustness of the gist allows successful classification even with this noise.
  2. A way to describe the image.  We use gists.  Python lib for their computation.
  3. A way to find the most close gists to an input image.  Here’s our code.

But what’s the gist of an image?

To determine the similarity of images, we used gist descriptors, an image description technique that comes from MIT.  A gist descriptor is a a sequence of numbers that describe an image in its entirety.  It summarizes the “naturalness, openness, roughness, expansion, ruggedness” of an image through a series of mathematical manipulations focusing heavily on the discrete fourier transform.

How do you find how ‘close’ two images are?

Each gist is a vector of 960 floating point numbers.  Taking the simple euclidean distance between two gists yields their distance.

How is a search actually done?

We precompute the gists for the training data and load these into memory when the server starts up.  When someone uploads an image, we calculate its gist.  We find the distance from this gist to each of the others and sort the gists from closest to furthest.  We return the 10 closest images.

And the labels?

As for the labels, we use a weighted average of the keywords associated with the 100 closest images.  We return the labels in sorted order with their corresponding likelihood.

Comments and extra resources

I hope this demystifies machine learning and image recognition a little bit.  I hope to continue my series on machine learning and machine learning infrastructure in the coming weeks.

If you want to build other datasets, here is my python library to extract data from the tiny image dataset by keyword: PyTinyImage

 

 

 

A Tail of Woe: The Trouble with fstat Caching

tl;dr Subsequent calls to fstat may return stale file info such as size, last accessed time, etc.. no matter what file system you are using.  Always open and close a file in order to poll its size.  e.g. code

In order to prep for a coding interview, I was re-implementing the *nix tail command.  As a refresher, the tail command prints the last 10 lines of a file and, if passed the -f flag, will continue to watch the file, printing to stdout anything written to it.

To account for the -f flag, we poll on file on the size of the file, calling fstat periodically in order to determine the files current size.  i.e.

   
//fd is our file descriptor and follow is a bool telling us to follow the file
if (follow) {
      int readr;
      while (1) {        
        struct stat sbuf2;
        fstat(fd, &sbuf2);
        if (remsize < sbuf2.st_size) {           
          diff = sbuf2.st_size - remsize;           
          lseek(fd, --remsize, SEEK_SET);           
          while (diff > 0) {
            incr = (diff < BUFSIZE) ? diff : BUFSIZE;
            int r = read(fd, buf, incr);
            diff -= incr;
            write(1, buf, incr);
          }
          remsize = sbuf2.st_size;
        } 
        sleep(LOOPSLEEP); 
      }
    }

While the idea behind this code is fine, it contains a subtle problem: fstat may cache the file attributes and return the same values on subsequent calls even though the underlying file has been modified.

I first thought that this only applied to NFS: http://cow.physics.wisc.edu/~craigm/idl/archive/msg00399.html

I then went on to test it locally.  My local machine is running Ubuntu 12.04 with the ext4 filesystem. Locally too, i experienced the same problem.

The solution to this issue is simple: close and then open the file again in order to force the os to reload the file attribute cache:

   if (follow) {
      int readr;
      while (1) {
        //http://cow.physics.wisc.edu/~craigm/idl/archive/msg00399.html
        // - seems to be an issue with many file systems; local ext3 on ubuntu 12.04 also caches fstat
        // - fstat may not return newest file size in files contained in NFS
        // - teach me to dev on ec2
        // no dice fchown(fd, sbuf.st_uid, sbuf.st_gid);
        // no dice fcntl
        if (fd > 0) {
          close(fd);
          fd = open(filepath, O_RDONLY, 0);
        }

        struct stat sbuf2;
        fstat(fd, &sbuf2);
        if (remsize < sbuf2.st_size) {           
          diff = sbuf2.st_size - remsize;           
          lseek(fd, --remsize, SEEK_SET);           
          while (diff > 0) {
            incr = (diff < BUFSIZE) ? diff : BUFSIZE;
            int r = read(fd, buf, incr);
            diff -= incr;
            write(1, buf, incr);
          }
          remsize = sbuf2.st_size;
        } 
        sleep(LOOPSLEEP); 
      }
    }

I did not find that fcntl and fchown necessarily forced the file attributes to be reloaded.  It is safer to close and open the file.  Others have concluded similarly: http://irccrew.org/~cras/nfs-coding-howto.html