rxnutils.data package

Subpackages

Submodules

rxnutils.data.base_pipeline module

Module containing base class for data pipelines

class rxnutils.data.base_pipeline.DataBaseFlow(use_cli=True)

Bases: FlowSpec

Base-class for pipelines for processing data

nbatches = metaflow.Parameter(name=nbatches, kwargs={'type': <class 'int'>, 'required': True, 'show_default': True})
folder = metaflow.Parameter(name=folder, kwargs={'default': '.', 'required': False, 'show_default': True, 'type': <class 'str'>})
class rxnutils.data.base_pipeline.DataPreparationBaseFlow(use_cli=True)

Bases: DataBaseFlow

Base pipeline for preparing datasets and doing clean-up

data_prefix = ''

rxnutils.data.batch_utils module

rxnutils.data.batch_utils.nlines(filename)

Count and return the number of lines in a file

Parameters:

filename (str)

Return type:

int

rxnutils.data.batch_utils.combine_batches(filename, nbatches, read_func, write_func, combine_func)
Parameters:
  • filename (str)

  • nbatches (int)

  • read_func (Any)

  • write_func (Any)

  • combine_func (Any)

Return type:

None

rxnutils.data.batch_utils.combine_csv_batches(filename, nbatches)

Combine CSV batches to one master file

The batch files are removed from disc

Parameters:
  • filename (str) – the filename of the master file

  • nbatches (int) – the number of batches

Return type:

None

rxnutils.data.batch_utils.combine_sparse_matrix_batches(filename, nbatches)

Combine sparse matrix batches to one master file

The batch files are removed from disc

Parameters:
  • filename (str) – the filename of the master file

  • nbatches (int) – the number of batches

Return type:

None

rxnutils.data.batch_utils.create_csv_batches(filename, nbatches, output_filename=None)

Create batches for reading a splitted CSV-file

The batches will be in the form of a tuple with three indices:
  • Batch index

  • Start index

  • End index

Parameters:
  • filename (str) – the CSV file to make batches of

  • nbatches (int) – the number of batches

  • output_filename (str | None)

Returns:

the created batches

Return type:

List[Tuple[int, int, int]]

rxnutils.data.batch_utils.read_csv_batch(filename, batch=None, **kwargs)

Read parts of a CSV file as specified by a batch

Parameters:
  • filename (str) – the path to the CSV file on disc

  • batch (Tuple[int, ...] | None) – the batch specification as returned by create_csv_batches

  • kwargs (Any)

Return type:

DataFrame

rxnutils.data.mapping module

Module containing script to atom-map USPTO or ORD reactions

rxnutils.data.mapping.main(input_args=None)

Function for command-line tool

Parameters:

input_args (Sequence[str] | None)

Return type:

None

rxnutils.data.mapping_pipeline module

Module containing pipeline for mapping with rxnmapper This needs to be run in an environment with rxnmapper installed

class rxnutils.data.mapping_pipeline.RxnMappingFlow(use_cli=True)

Bases: DataBaseFlow

Pipeline for atom-map USPTO or ORD data with rxnmapper

data_prefix = metaflow.Parameter(name=data-prefix, kwargs={'required': False, 'show_default': True, 'type': <class 'str'>})
start()

Setup batches for mapping

do_mapping()

Perform atom-mapping of reactions

join_mapping(_)

Join batches from mapping

end()

Final step, just print information

Module contents