💾 Read & Write¶

Submodules¶

hypergraphx.readwrite.load module¶

hypergraphx.readwrite.load.download_remote_dataset(dataset_name, *, fmt='hgx', timeout=30, verify_ssl=False, cache_dir=None, overwrite=False, catalog_url=None, use_catalog=True, dataset_info=None)[source]¶

Download and cache a remote dataset without loading it into memory.

Parameters:

dataset_name (str) – Dataset identifier, such as "zoo" or "contacts-hospital".
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to download. If explicitly set to None, JSON URLs are tried first, then binary URLs.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when a matching cached file exists.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
use_catalog (bool, default=True) – If True, resolve download URLs from the remote catalog before falling back to legacy hard-coded URL patterns.
dataset_info (dict, optional) – Already loaded catalog entry. Passing this avoids reloading the catalog when downloading many datasets.

Returns:

Local decompressed cache path, suitable for load_hypergraph(...).

Return type:

pathlib.Path

hypergraphx.readwrite.load.download_remote_datasets(dataset_names=None, *, attributes=None, match_all=True, fmt='hgx', timeout=30, verify_ssl=False, cache_dir=None, overwrite=False, catalog_url=None, continue_on_error=False, progress_callback=None)[source]¶

Download and cache multiple remote datasets.

Parameters:

dataset_names (str | Iterable[str], optional) – Dataset names, filenames, or directories to download explicitly.
attributes (str | Iterable[str], optional) – Tag/category names used to select datasets from the catalog. If both dataset_names and attributes are provided, named datasets are filtered by the requested attributes.
match_all (bool, default=True) – If True, selected datasets must contain all requested attributes. If False, any requested attribute is enough.
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to download.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when matching cached files exist.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
continue_on_error (bool, default=False) – If True, keep downloading after a dataset fails and store the exception in that dataset’s result record. If False, raise on the first failure.
progress_callback (callable, optional) – Called after each dataset with its result record.

Returns:

Mapping from canonical dataset name to records with path, metadata, error, and status fields.

Return type:

dict

hypergraphx.readwrite.load.get_remote_dataset_info(dataset_name, *, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

Return the full catalog entry for a remote dataset.

dataset_name is matched against the catalog name, filename, and directory fields.

hypergraphx.readwrite.load.iter_remote_hypergraphs(attributes=None, *, names=None, match_all=True, fmt='hgx', timeout=30, verify_ssl=False, catalog_url=None, include_metadata=False, store=True, cache_dir=None, overwrite=False)[source]¶

Yield remote hypergraphs selected by name or catalog tags/categories.

Parameters:

attributes (str | Iterable[str], optional) – Tag/category names to match, such as "Undirected" or ["Undirected", "Temporal"]. Matching is case-insensitive.
names (str | Iterable[str], optional) – Dataset names, filenames, or directories to load explicitly. If omitted, datasets are selected from attributes.
match_all (bool, default=True) – If True, a dataset must contain all requested attributes. If False, any requested attribute is enough.
fmt ({"hgx", "binary", "json"}, default="hgx") – Remote format to load for each matching dataset.
verify_ssl (bool, default=False) – Whether to verify TLS certificates for remote requests.
catalog_url (str, optional) – Catalog metadata URL used for filtering.
include_metadata (bool, default=False) – If True, yield (hypergraph, dataset_info) pairs. Otherwise yield only the hypergraph object.
store (bool, default=True) – Store downloaded datasets locally before loading them.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download matching datasets even when cached files exist.

Notes

This is a generator: datasets are downloaded and loaded lazily as the iterator advances.

hypergraphx.readwrite.load.list_remote_datasets(*, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

List datasets advertised by the remote Hypergraphx-data catalog.

Returns a list of dictionaries with at least: - name - tags / categories - vertices - edges

Parameters:

timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates when downloading the catalog. Defaults to False for compatibility with the current dataset server.
catalog_url (str, optional) – Catalog metadata URL. Defaults to the Hypergraphx-data GitHub raw URL, or HYPERGRAPHX_DATA_CATALOG_URL if set.

Notes

catalog_url can point to the generated catalog.json file, a JSON list, or the legacy related-data.js file used by the website.

hypergraphx.readwrite.load.load(obj_or_path)[source]¶

hypergraphx.readwrite.load.load_hypergraph(file_name, *, fmt=None)[source]¶

Load a hypergraph from disk.

Parameters:

file_name (str or path-like) – Input file path.
fmt ({"json", "pickle", "hgr"} | None) – Optional override for the input format. If None (default), infer format from the file extension. Gzipped files with .gz suffix are supported for each local format, such as .json.gz and .hgx.gz.

hypergraphx.readwrite.load.load_hypergraph_from_server(dataset_name, *, fmt='hgx', as_dict=False, timeout=30, verify_ssl=False, store=True, cache_dir=None, overwrite=False, catalog_url=None, use_catalog=True, dataset_info=None)[source]¶

Load a dataset by name from the remote Hypergraphx-data server.

Parameters:

dataset_name (str) – Dataset identifier, such as "zoo" or "contacts-hospital".
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to load. "hgx" and "binary" load the compact binary Hypergraphx format; "json" loads the JSON format. If explicitly set to None, JSON URLs are tried first, then binary URLs.
as_dict (bool, default=False) – If True, return the exposed internal data-structure dictionary instead of a hypergraph object.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates. Defaults to False for compatibility with the current dataset server certificate chain.
store (bool, default=True) – Store the decompressed remote dataset locally before loading it. Cached files are reused on later calls.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when a matching cached file exists.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
use_catalog (bool, default=True) – If True, resolve download URLs from the remote catalog before falling back to legacy hard-coded URL patterns.
dataset_info (dict, optional) – Already loaded catalog entry. Passing this avoids reloading the catalog when loading many datasets.

Returns:

Loaded hypergraph object, or its exposed dictionary if as_dict=True.

Return type:

Hypergraph | DirectedHypergraph | TemporalHypergraph | MultiplexHypergraph | dict

Notes

The loader tries current per-dataset .json.gz / .hgx.gz URLs first and keeps older flat URLs as fallbacks. When store=True, compressed downloads are decompressed before being written to the cache.

hypergraphx.readwrite.load.search_remote_datasets(query=None, *, tags=None, match_all_tags=True, source=None, license=None, min_nodes=None, max_nodes=None, min_edges=None, max_edges=None, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

Search the remote Hypergraphx-data catalog.

Parameters:

query (str, optional) – Case-insensitive substring matched against dataset names and tags.
tags (str | Iterable[str], optional) – Tags/categories to require. Matching is case-insensitive.
match_all_tags (bool, default=True) – If True, all requested tags must be present. If False, any requested tag is enough.
source (str, optional) – Case-insensitive substring matched against the source URL/text.
license (str, optional) – Case-insensitive substring matched against the license identifier/text.
min_nodes (int, optional) – Inclusive size filters using catalog vertices and edges.
max_nodes (int, optional) – Inclusive size filters using catalog vertices and edges.
min_edges (int, optional) – Inclusive size filters using catalog vertices and edges.
max_edges (int, optional) – Inclusive size filters using catalog vertices and edges.

Returns:

Matching catalog entries in catalog order.

Return type:

list[dict]

hypergraphx.readwrite.save module¶

hypergraphx.readwrite.save.save_hypergraph(hypergraph, file_name, *, fmt='json', binary=None)[source]¶

Save a hypergraph to disk.

Parameters:

hypergraph – Hypergraph-like object.
file_name (str) – Output file path.
fmt ({"json", "pickle"}) – Output format (default: “json”).
binary (bool | None) – Backward-compatible alias for fmt=”pickle” when True. If provided, overrides fmt and emits a DeprecationWarning.

hypergraphx.readwrite.hif module¶

hypergraphx.readwrite.hif.read_hif(path)[source]¶

Load a hypergraph from a HIF file.

Parameters:: path (str) – The path to the HIF file
Returns:: The loaded hypergraph
Return type:: Hypergraph

hypergraphx.readwrite.hif.write_hif(H, path)[source]¶

Save a hypergraph to a HIF file.

Parameters:

H (Hypergraph) – The hypergraph to save.
path (str) – The path to save the hypergraph to.

hypergraphx.readwrite.io_json module¶

hypergraphx.readwrite.io_json.load_json_file(file_name)[source]¶

hypergraphx.readwrite.io_json.save_json_hypergraph(hypergraph, file_name)[source]¶

hypergraphx.readwrite.io_pickle module¶

hypergraphx.readwrite.io_pickle.load_pickle(file_name)[source]¶

hypergraphx.readwrite.io_pickle.save_pickle(obj, file_name)[source]¶

hypergraphx.readwrite.hashing module¶

hypergraphx.readwrite.hashing.hash_hypergraph(hypergraph)[source]¶

Generates a SHA-256 hash of a hypergraph based on its exposed attributes.

Parameters:: hypergraph (object) – The hypergraph instance to hash. Should implement expose_attributes_for_hashing.
Returns:: The SHA-256 hash hex digest of the hypergraph.
Return type:: str

Module contents¶

hypergraphx.readwrite.download_remote_dataset(dataset_name, *, fmt='hgx', timeout=30, verify_ssl=False, cache_dir=None, overwrite=False, catalog_url=None, use_catalog=True, dataset_info=None)[source]¶

Download and cache a remote dataset without loading it into memory.

Parameters:

dataset_name (str) – Dataset identifier, such as "zoo" or "contacts-hospital".
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to download. If explicitly set to None, JSON URLs are tried first, then binary URLs.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when a matching cached file exists.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
use_catalog (bool, default=True) – If True, resolve download URLs from the remote catalog before falling back to legacy hard-coded URL patterns.
dataset_info (dict, optional) – Already loaded catalog entry. Passing this avoids reloading the catalog when downloading many datasets.

Returns:

Local decompressed cache path, suitable for load_hypergraph(...).

Return type:

pathlib.Path

hypergraphx.readwrite.download_remote_datasets(dataset_names=None, *, attributes=None, match_all=True, fmt='hgx', timeout=30, verify_ssl=False, cache_dir=None, overwrite=False, catalog_url=None, continue_on_error=False, progress_callback=None)[source]¶

Download and cache multiple remote datasets.

Parameters:

dataset_names (str | Iterable[str], optional) – Dataset names, filenames, or directories to download explicitly.
attributes (str | Iterable[str], optional) – Tag/category names used to select datasets from the catalog. If both dataset_names and attributes are provided, named datasets are filtered by the requested attributes.
match_all (bool, default=True) – If True, selected datasets must contain all requested attributes. If False, any requested attribute is enough.
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to download.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when matching cached files exist.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
continue_on_error (bool, default=False) – If True, keep downloading after a dataset fails and store the exception in that dataset’s result record. If False, raise on the first failure.
progress_callback (callable, optional) – Called after each dataset with its result record.

Returns:

Mapping from canonical dataset name to records with path, metadata, error, and status fields.

Return type:

dict

hypergraphx.readwrite.get_remote_dataset_info(dataset_name, *, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

Return the full catalog entry for a remote dataset.

dataset_name is matched against the catalog name, filename, and directory fields.

hypergraphx.readwrite.iter_remote_hypergraphs(attributes=None, *, names=None, match_all=True, fmt='hgx', timeout=30, verify_ssl=False, catalog_url=None, include_metadata=False, store=True, cache_dir=None, overwrite=False)[source]¶

Yield remote hypergraphs selected by name or catalog tags/categories.

Parameters:

attributes (str | Iterable[str], optional) – Tag/category names to match, such as "Undirected" or ["Undirected", "Temporal"]. Matching is case-insensitive.
names (str | Iterable[str], optional) – Dataset names, filenames, or directories to load explicitly. If omitted, datasets are selected from attributes.
match_all (bool, default=True) – If True, a dataset must contain all requested attributes. If False, any requested attribute is enough.
fmt ({"hgx", "binary", "json"}, default="hgx") – Remote format to load for each matching dataset.
verify_ssl (bool, default=False) – Whether to verify TLS certificates for remote requests.
catalog_url (str, optional) – Catalog metadata URL used for filtering.
include_metadata (bool, default=False) – If True, yield (hypergraph, dataset_info) pairs. Otherwise yield only the hypergraph object.
store (bool, default=True) – Store downloaded datasets locally before loading them.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download matching datasets even when cached files exist.

Notes

This is a generator: datasets are downloaded and loaded lazily as the iterator advances.

hypergraphx.readwrite.list_remote_datasets(*, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

List datasets advertised by the remote Hypergraphx-data catalog.

Returns a list of dictionaries with at least: - name - tags / categories - vertices - edges

Parameters:

timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates when downloading the catalog. Defaults to False for compatibility with the current dataset server.
catalog_url (str, optional) – Catalog metadata URL. Defaults to the Hypergraphx-data GitHub raw URL, or HYPERGRAPHX_DATA_CATALOG_URL if set.

Notes

catalog_url can point to the generated catalog.json file, a JSON list, or the legacy related-data.js file used by the website.

hypergraphx.readwrite.load_any(obj_or_path)¶

hypergraphx.readwrite.load_hypergraph(file_name, *, fmt=None)[source]¶

Load a hypergraph from disk.

Parameters:

file_name (str or path-like) – Input file path.
fmt ({"json", "pickle", "hgr"} | None) – Optional override for the input format. If None (default), infer format from the file extension. Gzipped files with .gz suffix are supported for each local format, such as .json.gz and .hgx.gz.

hypergraphx.readwrite.load_hypergraph_from_server(dataset_name, *, fmt='hgx', as_dict=False, timeout=30, verify_ssl=False, store=True, cache_dir=None, overwrite=False, catalog_url=None, use_catalog=True, dataset_info=None)[source]¶

Load a dataset by name from the remote Hypergraphx-data server.

Parameters:

dataset_name (str) – Dataset identifier, such as "zoo" or "contacts-hospital".
fmt ({"hgx", "binary", "json"} or None, default="hgx") – Remote format to load. "hgx" and "binary" load the compact binary Hypergraphx format; "json" loads the JSON format. If explicitly set to None, JSON URLs are tried first, then binary URLs.
as_dict (bool, default=False) – If True, return the exposed internal data-structure dictionary instead of a hypergraph object.
timeout (int, default=30) – Download timeout in seconds.
verify_ssl (bool, default=False) – Whether to verify TLS certificates. Defaults to False for compatibility with the current dataset server certificate chain.
store (bool, default=True) – Store the decompressed remote dataset locally before loading it. Cached files are reused on later calls.
cache_dir (path-like, optional) – Cache directory. Defaults to ~/.cache/hypergraphx/datasets or the HYPERGRAPHX_DATA_CACHE environment variable.
overwrite (bool, default=False) – If True, re-download even when a matching cached file exists.
catalog_url (str, optional) – Catalog metadata URL used to resolve dataset download URLs.
use_catalog (bool, default=True) – If True, resolve download URLs from the remote catalog before falling back to legacy hard-coded URL patterns.
dataset_info (dict, optional) – Already loaded catalog entry. Passing this avoids reloading the catalog when loading many datasets.

Returns:

Loaded hypergraph object, or its exposed dictionary if as_dict=True.

Return type:

Hypergraph | DirectedHypergraph | TemporalHypergraph | MultiplexHypergraph | dict

Notes

hypergraphx.readwrite.read_hif(path)[source]¶

Load a hypergraph from a HIF file.

Parameters:: path (str) – The path to the HIF file
Returns:: The loaded hypergraph
Return type:: Hypergraph

hypergraphx.readwrite.save_hypergraph(hypergraph, file_name, *, fmt='json', binary=None)[source]¶

Save a hypergraph to disk.

Parameters:

hypergraph – Hypergraph-like object.
file_name (str) – Output file path.
fmt ({"json", "pickle"}) – Output format (default: “json”).
binary (bool | None) – Backward-compatible alias for fmt=”pickle” when True. If provided, overrides fmt and emits a DeprecationWarning.

hypergraphx.readwrite.search_remote_datasets(query=None, *, tags=None, match_all_tags=True, source=None, license=None, min_nodes=None, max_nodes=None, min_edges=None, max_edges=None, timeout=30, verify_ssl=False, catalog_url=None)[source]¶

Search the remote Hypergraphx-data catalog.

Parameters:

query (str, optional) – Case-insensitive substring matched against dataset names and tags.
tags (str | Iterable[str], optional) – Tags/categories to require. Matching is case-insensitive.
match_all_tags (bool, default=True) – If True, all requested tags must be present. If False, any requested tag is enough.
source (str, optional) – Case-insensitive substring matched against the source URL/text.
license (str, optional) – Case-insensitive substring matched against the license identifier/text.
min_nodes (int, optional) – Inclusive size filters using catalog vertices and edges.
max_nodes (int, optional) – Inclusive size filters using catalog vertices and edges.
min_edges (int, optional) – Inclusive size filters using catalog vertices and edges.
max_edges (int, optional) – Inclusive size filters using catalog vertices and edges.

Returns:

Matching catalog entries in catalog order.

Return type:

list[dict]

💾 Read & Write¶

Submodules¶

hypergraphx.readwrite.load module¶

hypergraphx.readwrite.save module¶

hypergraphx.readwrite.hif module¶

hypergraphx.readwrite.io_json module¶

hypergraphx.readwrite.io_pickle module¶

hypergraphx.readwrite.hashing module¶

Module contents¶

Next steps¶