rxnutils.data.uspto package

Submodules

rxnutils.data.uspto.combine module

Module containing script to combine raw USPTO files

It will:
  • preserve the ReactionSmiles and Year columns

  • create an ID from PatentNumber and ParagraphNum and row index in the original file

rxnutils.data.uspto.combine.main(args=None)

Function for command-line tool

Parameters:

args (Sequence[str] | None)

Return type:

None

rxnutils.data.uspto.download module

Module containing a script to download USPTO files Figshare

rxnutils.data.uspto.download.main(args=None)

Function for command-line tool

Parameters:

args (Sequence[str] | None)

Return type:

None

rxnutils.data.uspto.preparation_pipeline module

Module containing pipeline for downloading, transforming and cleaning USPTO data This needs to be run in an environment with rxnutils installed

class rxnutils.data.uspto.preparation_pipeline.UsptoDataPreparationFlow(use_cli=True)

Bases: DataPreparationBaseFlow

Pipeline for download UPSTO source file, combining them and do some clean-up

data_prefix = 'uspto'
start()

Download USPTO data from Figshare

combine_files()

Combine USPTO data files and add IDs

setup_cleaning()

Setup cleaning

do_cleaning()

Perform cleaning of data

join_cleaning(_)

Combined cleaned batches of data

end()

Final step, just print information

rxnutils.data.uspto.uspto_yield module

Code for curating USPTO yields.

Inspiration from this code: https://github.com/DocMinus/Yield_curation_USPTO

This could potentially be an action, but since it only make sens to use it with USPTO data, it resides here for now.

class rxnutils.data.uspto.uspto_yield.UsptoYieldCuration(text_yield_column='TextMinedYield', calc_yield_column='CalculatedYield', out_column='CuratedYield')

Bases: object

Action for curating USPTO yield columns

Parameters:
  • text_yield_column (str)

  • calc_yield_column (str)

  • out_column (str)

text_yield_column: str = 'TextMinedYield'
calc_yield_column: str = 'CalculatedYield'
out_column: str = 'CuratedYield'

Module contents