io_util¶
-
class
hanlp.utils.io_util.
NumpyEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Constructor for JSONEncoder, with sensible defaults.
If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.
If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.
If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an OverflowError). Otherwise, no such check takes place.
If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.
If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.
If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.
If specified, separators should be an (item_separator, key_separator) tuple. The default is (‘, ‘, ‘: ‘) if indent is
None
and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a
TypeError
.-
default
(obj)[source]¶ Special json encoder for numpy types See https://interviewbubble.com/typeerror-object-of-type-float32-is-not-json-serializable/
- Parameters
obj – Object to be json encoded.
- Returns
Json string.
-
-
hanlp.utils.io_util.
check_outdated
(package='hanlp', version='2.1.0-alpha.30', repository_url='https://pypi.python.org/pypi/%s/json')[source]¶ Given the name of a package on PyPI and a version (both strings), checks if the given version is the latest version of the package available. Returns a 2-tuple (installed_version, latest_version) repository_url is a % style format string to use a different repository PyPI repository URL, e.g. test.pypi.org or a private repository. The string is formatted with the package name. Adopted from https://github.com/alexmojaki/outdated/blob/master/outdated/__init__.py
- Parameters
package – Package name.
version – Installed version string.
repository_url – URL on pypi.
- Returns
Parsed installed version and latest version.
-
hanlp.utils.io_util.
get_exitcode_stdout_stderr
(cmd)[source]¶ Execute the external command and get its exitcode, stdout and stderr. See https://stackoverflow.com/a/21000308/3730690
- Parameters
cmd – Command.
- Returns
Exit code, stdout, stderr.
-
hanlp.utils.io_util.
get_resource
(path: str, save_dir='/Users/hankcs/.hanlp', extract=True, prefix='https://file.hankcs.com/hanlp/', append_location=True, verbose=False)[source]¶ Fetch real (local) path for a resource (model, corpus, whatever) to
save_dir
.- Parameters
path – A local path (which will returned as is) or a remote URL (which will be downloaded, decompressed then returned).
save_dir – Where to store the resource (Default value =
hanlp.utils.io_util.hanlp_home()
)extract – Whether to unzip it if it’s a zip file (Default value = True)
prefix – A prefix when matched with an URL (path), then that URL is considered to be official. For official resources, they will not go to a folder called
thirdparty
underIDX
.append_location – (Default value = True)
verbose – Whether to print log messages.
- Returns
The real path to the resource.
-
hanlp.utils.io_util.
hanlp_home
()[source]¶ Home directory for HanLP resources.
- Returns
Data directory in the filesystem for storage, for example when downloading models.
This home directory can be customized with the following shell command or equivalent environment variable on Windows systems.
$ export HANLP_HOME=/data/hanlp
-
hanlp.utils.io_util.
hanlp_home_default
()[source]¶ Default data directory depending on the platform and environment variables
-
hanlp.utils.io_util.
replace_ext
(filepath, ext) → str[source]¶ Replace the extension of filepath to ext.
- Parameters
filepath – Filepath to be replaced.
ext – Extension to replace.
- Returns
A new path.
-
hanlp.utils.io_util.
stdout_redirected
(to='/dev/null', stdout=None)[source]¶ Redirect stdout to else where. Copied from https://stackoverflow.com/questions/4675728/redirect-stdout-to-a-file-in-python/22434262#22434262
- Parameters
to – Target device.
stdout – Source device.