rxnutils.data.uspto package¶
Submodules¶
rxnutils.data.uspto.combine module¶
Module containing script to combine raw USPTO files
- It will:
preserve the ReactionSmiles and Year columns
create an ID from PatentNumber and ParagraphNum and row index in the original file
- rxnutils.data.uspto.combine.main(args=None)¶
Function for command-line tool
- Parameters:
args (Sequence[str] | None)
- Return type:
None
rxnutils.data.uspto.download module¶
Module containing a script to download USPTO files Figshare
- rxnutils.data.uspto.download.main(args=None)¶
Function for command-line tool
- Parameters:
args (Sequence[str] | None)
- Return type:
None
rxnutils.data.uspto.preparation_pipeline module¶
Module containing pipeline for downloading, transforming and cleaning USPTO data This needs to be run in an environment with rxnutils installed
- class rxnutils.data.uspto.preparation_pipeline.UsptoDataPreparationFlow(use_cli=True)¶
Bases:
DataPreparationBaseFlow
Pipeline for download UPSTO source file, combining them and do some clean-up
- data_prefix = 'uspto'¶
- start()¶
Download USPTO data from Figshare
- combine_files()¶
Combine USPTO data files and add IDs
- setup_cleaning()¶
Setup cleaning
- do_cleaning()¶
Perform cleaning of data
- join_cleaning(_)¶
Combined cleaned batches of data
- end()¶
Final step, just print information
rxnutils.data.uspto.uspto_yield module¶
Code for curating USPTO yields.
Inspiration from this code: https://github.com/DocMinus/Yield_curation_USPTO
This could potentially be an action, but since it only make sens to use it with USPTO data, it resides here for now.
- class rxnutils.data.uspto.uspto_yield.UsptoYieldCuration(text_yield_column='TextMinedYield', calc_yield_column='CalculatedYield', out_column='CuratedYield')¶
Bases:
object
Action for curating USPTO yield columns
- Parameters:
text_yield_column (str)
calc_yield_column (str)
out_column (str)
- text_yield_column: str = 'TextMinedYield'¶
- calc_yield_column: str = 'CalculatedYield'¶
- out_column: str = 'CuratedYield'¶