process

heliopy.data.util.process(dirs, fnames, extension, local_base_dir, remote_base_url, download_func, processing_func, starttime, endtime, try_download=True, units=None, processing_kwargs={}, download_info=[], remote_fnames=None, warn_missing_units=True)

The main utility method for systematically loading, downloading, and saving data.

Parameters:
  • dirs (list) – A list of directories relative to local_base_dir.
  • fnames (list or str or regex) – A list of filenames without their extension. These are the filenames that will be downloaded from the remote source. Must be the same length as dirs. Each filename is saved in it’s respective entry in dirs. Can also be a regular expression that is used to match the filename (e.g. for version numbers)
  • extension (str) – File extension of the raw files. Must include leading dot.
  • local_base_dir (str) – Local base directory. fname[i] will be stored in local_base_dir / dirs[i] / fname[i] + extension.
  • remote_base_url (str) – Remote base URL. fname[i] will be downloaded from Remote / dirs[i] / fname[i] + extension.
  • download_func

    Function that takes

    • The remote base url
    • The local base directory
    • The relative directory (relative to the base url)
    • The local filename to download to
    • The remote filename
    • A file extension

    and downloads the remote file. The signature must be:

    def download_func(remote_base_url, local_base_dir,
                      directory, fname, remote_fname, extension)
    

    The function can also return the path of the file it downloaded, if this is different to the filename it is given. download_func can either silently do nothing if a given file is not available, or raise a NoDataError with a descriptive error message that will be printed.

  • processing_func

    Function that takes an open CDF file or open plain text file, and returns a pandas DataFrame. The signature must be:

    def processing_func(file, **processing_kwargs)
    
  • starttime (datetime) – Start of requested interval.
  • endtime (datetime) – End of requested interval.
  • try_download (bool, optional) – If True, try to download data. If False don’t. Default is True.
  • units (OrderedDict, optional) –

    Manually defined units to be attached to the data that will be returned.

    Must map column headers (strings) to Quantity objects. If units are present, then a TimeSeries object is returned, else a Pandas DataFrame.

  • processing_kwargs (dict, optional) – Extra keyword arguments to be passed to the processing funciton.
  • download_info (list, optional) – A list with the same length as fnames, which contains extra info that is handed to download_func for each file individually.
  • remote_fnames (list of str, optional) – If the remote filenames are different from the desired downloaded filenames, this should be a list of length len(fnames) with the files to be downloaded. The ordering must be the same as fnames.
  • warn_missing_units (bool, optional) – If True, warnings will be shown for each variable that does not have associated units.
Returns:

Requested data.

Return type:

DataFrame or TimeSeries