rxnutils.data package¶

Subpackages¶

Submodules¶

rxnutils.data.base_pipeline module¶

Module containing base class for data pipelines

class rxnutils.data.base_pipeline.DataBaseFlow(use_cli=True)¶

Bases: FlowSpec

Base-class for pipelines for processing data

nbatches = metaflow.Parameter(name=nbatches, kwargs={})¶

folder = metaflow.Parameter(name=folder, kwargs={})¶

class rxnutils.data.base_pipeline.DataPreparationBaseFlow(use_cli=True)¶

Bases: DataBaseFlow

Base pipeline for preparing datasets and doing clean-up

data_prefix = ''¶

rxnutils.data.batch_utils module¶

rxnutils.data.batch_utils.nlines(filename)¶

Count and return the number of lines in a file

Parameters:: filename (str)
Return type:: int

rxnutils.data.batch_utils.combine_batches(filename, nbatches, read_func, write_func, combine_func)¶

Parameters:

filename (str)
nbatches (int)
read_func (Any)
write_func (Any)
combine_func (Any)

Return type:

None

rxnutils.data.batch_utils.combine_csv_batches(filename, nbatches)¶

Combine CSV batches to one master file

The batch files are removed from disc

Parameters:

filename (str) – the filename of the master file
nbatches (int) – the number of batches

Return type:

None

rxnutils.data.batch_utils.combine_numpy_array_batches(filename, nbatches)¶

Combine numpy array batches to one master file The batch files are removed from disc :param filename: the filename of the master file :param nbatches: the number of batches

Parameters:

filename (str)
nbatches (int)

Return type:

None

rxnutils.data.batch_utils.combine_sparse_matrix_batches(filename, nbatches)¶

Combine sparse matrix batches to one master file

The batch files are removed from disc

Parameters:

filename (str) – the filename of the master file
nbatches (int) – the number of batches

Return type:

None

rxnutils.data.batch_utils.create_csv_batches(filename, nbatches, output_filename=None)¶

Create batches for reading a splitted CSV-file

The batches will be in the form of a tuple with three indices:

Batch index
Start index
End index

Parameters:

filename (str) – the CSV file to make batches of
nbatches (int) – the number of batches
output_filename (str | None)

Returns:

the created batches

Return type:

List[Tuple[int, int, int]]

rxnutils.data.batch_utils.read_csv_batch(filename, batch=None, **kwargs)¶

Read parts of a CSV file as specified by a batch

Parameters:

filename (str) – the path to the CSV file on disc
batch (Tuple[int, ...]) – the batch specification as returned by create_csv_batches
kwargs (Any)

Return type:

DataFrame

rxnutils.data.mapping module¶

Module containing script to atom-map USPTO or ORD reactions

rxnutils.data.mapping.main(input_args=None)¶

Function for command-line tool

Parameters:: input_args (Sequence[str] | None)
Return type:: None

rxnutils.data.mapping_pipeline module¶

Module containing pipeline for mapping with rxnmapper This needs to be run in an environment with rxnmapper installed

class rxnutils.data.mapping_pipeline.RxnMappingFlow(use_cli=True)¶

Bases: DataBaseFlow

Pipeline for atom-map USPTO or ORD data with rxnmapper

data_prefix = metaflow.Parameter(name=data-prefix, kwargs={})¶

start()¶: Setup batches for mapping

do_mapping()¶: Perform atom-mapping of reactions

join_mapping(_)¶: Join batches from mapping

end()¶: Final step, just print information

ReactionUtils

Navigation

Related Topics

rxnutils.data package¶

Subpackages¶

Submodules¶

rxnutils.data.base_pipeline module¶

rxnutils.data.batch_utils module¶

rxnutils.data.mapping module¶

rxnutils.data.mapping_pipeline module¶

Module contents¶