rxnutils.data package¶
Subpackages¶
- rxnutils.data.ord package
- rxnutils.data.uspto package
Submodules¶
rxnutils.data.base_pipeline module¶
Module containing base class for data pipelines
- class rxnutils.data.base_pipeline.DataBaseFlow(use_cli=True)¶
Bases:
FlowSpec
Base-class for pipelines for processing data
- nbatches = metaflow.Parameter(name=nbatches, kwargs={})¶
- folder = metaflow.Parameter(name=folder, kwargs={})¶
- class rxnutils.data.base_pipeline.DataPreparationBaseFlow(use_cli=True)¶
Bases:
DataBaseFlow
Base pipeline for preparing datasets and doing clean-up
- data_prefix = ''¶
rxnutils.data.batch_utils module¶
- rxnutils.data.batch_utils.nlines(filename)¶
Count and return the number of lines in a file
- Parameters:
filename (str)
- Return type:
int
- rxnutils.data.batch_utils.combine_batches(filename, nbatches, read_func, write_func, combine_func)¶
- Parameters:
filename (str)
nbatches (int)
read_func (Any)
write_func (Any)
combine_func (Any)
- Return type:
None
- rxnutils.data.batch_utils.combine_csv_batches(filename, nbatches)¶
Combine CSV batches to one master file
The batch files are removed from disc
- Parameters:
filename (str) – the filename of the master file
nbatches (int) – the number of batches
- Return type:
None
- rxnutils.data.batch_utils.combine_numpy_array_batches(filename, nbatches)¶
Combine numpy array batches to one master file The batch files are removed from disc :param filename: the filename of the master file :param nbatches: the number of batches
- Parameters:
filename (str)
nbatches (int)
- Return type:
None
- rxnutils.data.batch_utils.combine_sparse_matrix_batches(filename, nbatches)¶
Combine sparse matrix batches to one master file
The batch files are removed from disc
- Parameters:
filename (str) – the filename of the master file
nbatches (int) – the number of batches
- Return type:
None
- rxnutils.data.batch_utils.create_csv_batches(filename, nbatches, output_filename=None)¶
Create batches for reading a splitted CSV-file
- The batches will be in the form of a tuple with three indices:
Batch index
Start index
End index
- Parameters:
filename (str) – the CSV file to make batches of
nbatches (int) – the number of batches
output_filename (str | None)
- Returns:
the created batches
- Return type:
List[Tuple[int, int, int]]
- rxnutils.data.batch_utils.read_csv_batch(filename, batch=None, **kwargs)¶
Read parts of a CSV file as specified by a batch
- Parameters:
filename (str) – the path to the CSV file on disc
batch (Tuple[int, ...]) – the batch specification as returned by create_csv_batches
kwargs (Any)
- Return type:
DataFrame
rxnutils.data.mapping module¶
Module containing script to atom-map USPTO or ORD reactions
- rxnutils.data.mapping.main(input_args=None)¶
Function for command-line tool
- Parameters:
input_args (Sequence[str] | None)
- Return type:
None
rxnutils.data.mapping_pipeline module¶
Module containing pipeline for mapping with rxnmapper This needs to be run in an environment with rxnmapper installed
- class rxnutils.data.mapping_pipeline.RxnMappingFlow(use_cli=True)¶
Bases:
DataBaseFlow
Pipeline for atom-map USPTO or ORD data with rxnmapper
- data_prefix = metaflow.Parameter(name=data-prefix, kwargs={})¶
- start()¶
Setup batches for mapping
- do_mapping()¶
Perform atom-mapping of reactions
- join_mapping(_)¶
Join batches from mapping
- end()¶
Final step, just print information