Document the data files and the database generation

In the README, add a section describing the expected fasta structure and how they are used to produce the database: species, chains, ids, hash...