Toptal
The Protein Data Bank (PDB) bioinformatics database is the world’s largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc.
This article explains how to extract, filter, and clean data from the PDB. This, in turn, enables the type of analysis explained in the article Occurrence of protein disulfide bonds in different domains of life: a comparison of proteins from the Protein Data Bank, published in Protein Engineering, Design and Selection, Volume 27, Issue 3, 1 March 2014, pp. 65–72.
The PDB has a lot of repeating structures with different resolutions, methods, mutations, etc. Doing an experiment with the same or similar proteins can produce bias in any group analysis, so we will need to choose the correct structure from among any set of duplicates. For that purpose, we need to use a non-redundant
To read the full article click on the 'post' link at the top.