CcdCache

class openff.pablo.ccd.CcdCache(library_paths: Iterable[Path | str], cache_path: Path | str = Path(xdg_base_dir.save_cache_path('openff-pablo'), 'ccd_cache'), preload: list[str] = [], patches: Iterable[Mapping[str, Callable[[ResidueDefinition], list[ResidueDefinition]]]] = {}, extra_definitions: Mapping[str, Iterable[ResidueDefinition]] = {})[source]

Bases: Mapping[str, tuple[ResidueDefinition, …]]

Caches, patches, and presents the CCD as a Python Mapping.

This class is a wrapper around a dict that stores residue definitions. When a residue is requested via the indexing syntax (for example, my_ccd_cache["ALA"]) or the in operator, this dictionary is checked first. If the residue is not present, the CCD is then checked. If the residue cannot be retrieved from the inner dict or the CCD, a KeyError is raised or False is returned as appropriate.

Iterating over the mapping, checking its length, or otherwise treating the CcdCache as a mapping other than with the indexing syntax or in operator works only on the inner dict. As a result, accessing a residue via indexing may return a value even if these other methods suggest it won’t.

CcdCache can apply patches to the entries it downloads from the CCD. This is used to work around known errors, deficiencies and inconsistencies in the CCD definitions. Patches are specified as functions that take a single residue definition and return a list of them.

The extra_definitions and the with_() and with_replaced methods allow custom definitions to be added to a CcdCache. These custom definitions are not patched. Since they are stored alongside cached entries in the inner dictionary, custom definitions supercede any that have not already been downloaded from the CCD.

When a CCD entry is downloaded, the corresponding CIF file is stored in the cache_path. This means that each entry will be downloaded only once, even across multiple Python invocations. All entries in the cache are loaded (and patches applied) when a new CcdCache is created. CcdCache assumes that the files in the cache path were downloaded from the CCD and may do unexpected things if they are edited by hand.

Users may provide additional CCD entries by specifying library paths. By default, this is used to ship commonly used residues with Pablo. At the moment, patches are applied to files in library paths, but it is likely that in the future they won’t be and residues shipped with Pablo will be shipped pre-patched to speed up load times. Like with the cache, files from library paths are loaded when a CcdCache is created.

Accessing the CCD requires internet access. Without internet access, entries from the cache or library paths can still be loaded, as can any entries added to an instance of this class.

Parameters:
  • library_paths – Paths to search for user-provided or packaged CCD entries. All paths are searched.

  • cache_path – The path to which to download CCD entries. This path is searched in addition to library_paths.

  • preload – A list of residue names to download when initializing the class.

  • patches – Functions to call on ResidueDefinitions downloaded from the CCD before they are returned or added to the inner dict. An iterable of maps from residue names each to a single callable. Each map is applied to residues with the given name in the order they are iterated over. Any patches corresponding to key "*" will be applied to all residues before the more specific patches in its map.

  • extra_definitions – Additional residue definitions to add to the cache. Note that patches are not applied to these definitions.

Instance and Static Methods

with_

Get a copy of this CcdCache with additional definitions added.

with_patch

Add a patch to the residues loaded via a copy of this CcdCache.

with_replaced

Get a copy of this CcdCache with some definitions replaced.

with_varied_protonation

Get a copy of self with all combinations of some protonation states.

with_virtual_sites

Copy self, adding new residue definitions requiring some virtual sites.

with_vsite_water

Copy self, adding new definitions for common multisite water models.

without

Get a copy of this CcdCache lacking any definitions with some names.

with_(definitions: Mapping[str, Sequence[ResidueDefinition]] | Sequence[ResidueDefinition]) Self[source]

Get a copy of this CcdCache with additional definitions added.

Definitions may be supplied as a mapping from residue names to sequences of residue definitions, or as a sequence of residue definitions. In the latter case, the residue names are taken from the residue definitions themselves.

Note that patches are not applied to the new definitions.

Examples

Add a custom definition to the STD_CCD_CACHE. We use a 4-letter residue code as they are supported by Pablo’s PDB reader and do not clash with the CCD’s definitions.

>>> from openff.pablo import STD_CCD_CACHE, ResidueDefinition
>>> my_ccd_cache = STD_CCD_CACHE.with_([
...     ResidueDefinition.from_smiles(
...         "[H:1][O:2][O:3][H:4]",
...         {1: "H1", 2: "O1", 3: "O2", 4: "H2"},
...         "HOOH",
...     )
... ])

Add protonation variants of a residue by specifying acidic and basic atoms.

>>> from openff.pablo import STD_CCD_CACHE, ResidueDefinition
>>>
>>> # Get the GABA (γ-amino butanoic acid) residue definition from CCD
>>> gaba_resdef = STD_CCD_CACHE["ABU"][0]
>>>
>>> # Generate the variants and add them to a new cache
>>> my_ccd_cache = STD_CCD_CACHE.with_({
...     "ABU": gaba_resdef.vary_protonation(
...         acidic=["HXT"], # Atom name of abstractable proton
...         basic=[("N", "H3")], # Atom to protonate, name of new proton
...     )[1:], # Skip the first entry, which is already in the cache
... })
>>> # Should have added three variants - positive, negative, zwitterion
>>> len(my_ccd_cache["ABU"]) - len(STD_CCD_CACHE["ABU"])
3
with_patch(residue_name: str, patch: Callable[[ResidueDefinition], list[ResidueDefinition]]) Self[source]

Add a patch to the residues loaded via a copy of this CcdCache.

The patch is added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

The patch function is called on each residue definition stored under the given residue name. The returned residue definitions are concatenated and replace the originals. Patches can therefore add, modify, split, or replace residue definitions depending on whether they include the original definition in the output.

The patch is applied to all definitions in the cache when this function is applied, as well as any definitions downloaded from the CCD in the future. It is not applied to definitions added by the other CcdCache.with_*() methods.

with_replaced(definitions: Mapping[str, Sequence[ResidueDefinition]] | Sequence[ResidueDefinition]) Self[source]

Get a copy of this CcdCache with some definitions replaced.

Similar to with_, but does not retain existing definitions for the specified residue names. All residue names that are keys of a definitions mapping or are residue names in a definitions sequence are removed from the new CcdCache before adding the new definitions.

Note that patches are not applied to the new definitions.

See also

with_, without

with_varied_protonation(residue_name: str, *, acidic: Iterable[str] = (), basic: Iterable[tuple[str, str]] = ()) Self[source]

Get a copy of self with all combinations of some protonation states.

Note that all combinations of protonations and deprotonations are generated; this means that if acidic has length n and basic has length m, 2**(n+m) variants will be generated for each existing variant.

If no variants at all are generated, PabloError is raised. Otherwise, whatever variants make sense are created for each existing variant.

This method will download the given residue name from the CCD if it is not already in the cache.

Parameters:
  • residue_name – The name of the residue to generate alternate protonation states for.

  • acidic – Existing hydrogen atoms that can be removed to form a new protonation state. Each element specifies an atom name to remove, decrementing the formal charge on the neighbouring heavy atom. Multiply bonded, unbonded, missing, or non-hydrogen atoms are skipped unless no variants at all are generated.

  • basic – Existing non-hydrogen atoms that can be protonated to form a new protonation state, as well as the canonical name of the new hydrogen. Each tuple specifies an atom name to protonate (increment the formal charge and form a bond) and the name of the added proton. Unknown heavy atoms and new atom names that clash with existing names raise are skipped unless no variants at all are generated.

with_virtual_sites(residue_name: str, virtual_sites: Iterable[str]) Self[source]

Copy self, adding new residue definitions requiring some virtual sites.

The new definition is added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

A new residue definition is added for each definition currently stored in the cache under the given name. The new definition requires that all the given virtual site names be present in order for it to match, and it discards the corresponding ATOM/HETATM records.

This method works by adding a patch. It will affect any residue definition already added to the cache under the given name, or any definition downloaded in the future, but not any definition added in the future by the with_ or with_replaced methods.

with_vsite_water() Self[source]

Copy self, adding new definitions for common multisite water models.

The new definitions are added to a copy of the CcdCache, and the copy is returned. The original CcdCache is left unmodified.

The new definitions require that all the virtual site names be present in order for them to match, and they discard the corresponding ATOM/HETATM records. The name for the 4-point model virtual site is EPW, and for the 5-point model EP1 and EP2.

This method works by adding a patch. It will affect any 3-atom residue definitions already added to the cache under the names HOH, WAT, or SOL, or any so-named definition downloaded in the future, but not any definition added in the future by the with_ or with_replaced methods.

without(residue_names: Iterable[str]) Self[source]

Get a copy of this CcdCache lacking any definitions with some names.

All definitions for each of the given residue names will not be present in the new cache. Note that residues that are in the CCD will still be returned when they are requested, as long as they can be re-downloaded or found in the cache.

See also

with_replaced, with_