2. Why Pablo is different
PDB files are a highly permissive format widely used in molecular modeling. The rules for how a PDB file should be written or read are vague enough — and the chemistry familiar enough — that the format often runs on the principle of “you know what I mean”. For traditional MD workflows, this permissiveness is an asset. Simulation engines can write PDB files with arbitrary virtual sites or coarse-grainings safe in the knowledge that most tools will be able to make do, at least in the biochemical context where the user has force field parameters.
The OpenFF ecosystem, however, relies on access to detailed chemical information of the modeled molecular system. In particular, we require a complete chemical graph; every atom, including hydrogens, their atomic numbers and formal charges, and their connectivity with bond orders. We use this rich chemical information to assign accurate force field parameters to an enormous variety of molecules. By contrast, PDB files have no specified facility for the specification of bond orders, and routinely omit bonds, formal charges, and hydrogen atoms altogether.
Pablo is designed to be able to interpret the chemistry of as many PDB files as possible while not introducing ambiguity, incorrect chemical inference, or dependence on structure to interpret chemistry. Pablo therefore does not infer bonds from proximity, does not infer the presence of any missing atoms including hydrogens, and may fail to load PDB files that other software has no trouble with. It may also require more information than you’re used to!
In particular, Pablo requires a library of chemical information to match the PDB file against. This library takes the basic form of a mapping from residue and atom names to chemical information. Known residues can also be matched on the basis of CONECT records and elements if their atoms are not named in the usual ways. Pablo provides a standard library, but the user can substitute their own if they routinely use the same unusual chemistries. The most common residues used in biomolecular modeling are shipped with Pablo as part of this standard libary, and a much larger library provided by the RCSB PDB can be automatically fetched from via the internet. Users can also augment or replace the library with their own residue templates.