![]() |
RDKit
Open-source cheminformatics and machine learning.
|
Substructure Search a library of molecules. More...
#include <SubstructLibrary.h>
Public Member Functions | |
SubstructLibrary () | |
SubstructLibrary (boost::shared_ptr< MolHolderBase > molecules) | |
SubstructLibrary (boost::shared_ptr< MolHolderBase > molecules, boost::shared_ptr< FPHolderBase > fingerprints) | |
SubstructLibrary (const std::string &pickle) | |
boost::shared_ptr< MolHolderBase > & | getMolHolder () |
Get the underlying molecule holder implementation. More... | |
const boost::shared_ptr< MolHolderBase > & | getMolHolder () const |
boost::shared_ptr< FPHolderBase > & | getFpHolder () |
Get the underlying molecule holder implementation. More... | |
const boost::shared_ptr< FPHolderBase > & | getFpHolder () const |
Get the underlying molecule holder implementation. More... | |
const MolHolderBase & | getMolecules () const |
FPHolderBase & | getFingerprints () |
Get the underlying fingerprint implementation. More... | |
const FPHolderBase & | getFingerprints () const |
unsigned int | addMol (const ROMol &mol) |
Add a molecule to the library. More... | |
std::vector< unsigned int > | getMatches (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1, int maxResults=-1) |
Get the matching indices for the query. More... | |
std::vector< unsigned int > | getMatches (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1, int maxResults=-1) |
Get the matching indices for the query between the given indices. More... | |
unsigned int | countMatches (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1) |
Return the number of matches for the query. More... | |
unsigned int | countMatches (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1) |
Return the number of matches for the query between the given indices. More... | |
bool | hasMatch (const ROMol &query, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1) |
Returns true if any match exists for the query. More... | |
bool | hasMatch (const ROMol &query, unsigned int startIdx, unsigned int endIdx, bool recursionPossible=true, bool useChirality=true, bool useQueryQueryMatches=false, int numThreads=-1) |
boost::shared_ptr< ROMol > | getMol (unsigned int idx) const |
Returns the molecule at the given index. More... | |
boost::shared_ptr< ROMol > | operator[] (unsigned int idx) |
Returns the molecule at the given index. More... | |
unsigned int | size () const |
return the number of molecules in the library More... | |
void | resetHolders () |
access required for serialization More... | |
void | toStream (std::ostream &ss) const |
serializes (pickles) to a stream More... | |
std::string | Serialize () const |
returns a string with a serialized (pickled) representation More... | |
void | initFromStream (std::istream &ss) |
initializes from a stream pickle More... | |
void | initFromString (const std::string &text) |
initializes from a string pickle More... | |
Substructure Search a library of molecules.
This class allows for multithreaded substructure searches os large datasets.
The implementations can use fingerprints to speed up searches and have molecules cached as binary forms to reduce memory usage.
basic usage:
Using different mol holders and pattern fingerprints.
Cached molecule holders create molecules on demand. There are currently three styles of cached molecules.
CachedMolHolder: stores molecules in the rdkit binary format. CachedSmilesMolHolder: stores molecules in smiles format. CachedTrustedSmilesMolHolder: stores molecules in smiles format.
The CachedTrustedSmilesMolHolder is made to add molecules from a trusted source. This makes the basic assumption that RDKit was used to sanitize and canonicalize the smiles string. In practice this is considerably faster than using arbitrary smiles strings since certain assumptions can be made.
When loading from external data, as opposed to using the "addMol" API, care must be taken to ensure that the pattern fingerprints and smiles are synchronized.
Each pattern holder has an API point for making its fingerprint. This is useful to ensure that the pattern stored in the database will be compatible with the patterns made when analyzing queries.
Definition at line 360 of file SubstructLibrary.h.
|
inline |
Definition at line 367 of file SubstructLibrary.h.
|
inline |
Definition at line 373 of file SubstructLibrary.h.
|
inline |
Definition at line 376 of file SubstructLibrary.h.
|
inline |
Definition at line 383 of file SubstructLibrary.h.
References RDKit::EnumerationStrategyPickler::pickle().
unsigned int RDKit::SubstructLibrary::addMol | ( | const ROMol & | mol | ) |
Add a molecule to the library.
mol | Molecule to add |
returns index for the molecule in the library
unsigned int RDKit::SubstructLibrary::countMatches | ( | const ROMol & | query, |
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 |
||
) |
Return the number of matches for the query.
query | Query to match against molecules |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
unsigned int RDKit::SubstructLibrary::countMatches | ( | const ROMol & | query, |
unsigned int | startIdx, | ||
unsigned int | endIdx, | ||
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 |
||
) |
Return the number of matches for the query between the given indices.
query | Query to match against molecules |
startIdx | Start index of the search |
endIdx | Ending idx (non-inclusive) of the search. |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
|
inline |
Get the underlying fingerprint implementation.
Throws a value error if no fingerprints have been set
Definition at line 413 of file SubstructLibrary.h.
|
inline |
Definition at line 419 of file SubstructLibrary.h.
|
inline |
Get the underlying molecule holder implementation.
Definition at line 399 of file SubstructLibrary.h.
|
inline |
Get the underlying molecule holder implementation.
Definition at line 402 of file SubstructLibrary.h.
std::vector<unsigned int> RDKit::SubstructLibrary::getMatches | ( | const ROMol & | query, |
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 , |
||
int | maxResults = -1 |
||
) |
Get the matching indices for the query.
query | Query to match against molecules |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
maxResults | Maximum results to return, -1 means return all [default -1] |
std::vector<unsigned int> RDKit::SubstructLibrary::getMatches | ( | const ROMol & | query, |
unsigned int | startIdx, | ||
unsigned int | endIdx, | ||
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 , |
||
int | maxResults = -1 |
||
) |
Get the matching indices for the query between the given indices.
query | Query to match against molecules |
startIdx | Start index of the search |
endIdx | Ending idx (non-inclusive) of the search. |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
maxResults | Maximum results to return, -1 means return all [default -1] |
|
inline |
Returns the molecule at the given index.
idx | Index of the molecule in the library |
Definition at line 549 of file SubstructLibrary.h.
References RDKit::MolHolderBase::getMol(), and PRECONDITION.
|
inline |
Definition at line 406 of file SubstructLibrary.h.
References PRECONDITION.
|
inline |
Get the underlying molecule holder implementation.
Definition at line 392 of file SubstructLibrary.h.
|
inline |
Definition at line 394 of file SubstructLibrary.h.
bool RDKit::SubstructLibrary::hasMatch | ( | const ROMol & | query, |
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 |
||
) |
Returns true if any match exists for the query.
query | Query to match against molecules |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
bool RDKit::SubstructLibrary::hasMatch | ( | const ROMol & | query, |
unsigned int | startIdx, | ||
unsigned int | endIdx, | ||
bool | recursionPossible = true , |
||
bool | useChirality = true , |
||
bool | useQueryQueryMatches = false , |
||
int | numThreads = -1 |
||
) |
Returns true if any match exists for the query between the specified indices
query | Query to match against molecules |
startIdx | Start index of the search |
endIdx | Ending idx (inclusive) of the search. |
recursionPossible | flags whether or not recursive matches are allowed [ default true ] |
useChirality | use atomic CIP codes as part of the comparison [ default true ] |
useQueryQueryMatches | if set, the contents of atom and bond queries [ default false ] will be used as part of the matching |
numThreads | If -1 use all available processors [default -1] |
void RDKit::SubstructLibrary::initFromStream | ( | std::istream & | ss | ) |
initializes from a stream pickle
void RDKit::SubstructLibrary::initFromString | ( | const std::string & | text | ) |
initializes from a string pickle
|
inline |
Returns the molecule at the given index.
idx | Index of the molecule in the library |
Definition at line 559 of file SubstructLibrary.h.
References RDKit::MolHolderBase::getMol(), and PRECONDITION.
|
inline |
access required for serialization
Definition at line 572 of file SubstructLibrary.h.
std::string RDKit::SubstructLibrary::Serialize | ( | ) | const |
returns a string with a serialized (pickled) representation
|
inline |
return the number of molecules in the library
Definition at line 566 of file SubstructLibrary.h.
References PRECONDITION.
void RDKit::SubstructLibrary::toStream | ( | std::ostream & | ss | ) | const |
serializes (pickles) to a stream