Examples

Validate with Pydantic

This example shows how to use pydantic to validate and parse a NestedText file. The file in this case specifies deployment settings for a web server:

debug: false
secret_key: t=)40**y&883y9gdpuw%aiig+wtc033(ui@^1ur72w#zhw3_ch

allowed_hosts:
  - www.example.com

database:
  engine: django.db.backends.mysql
  host: db.example.com
  port: 3306
  user: www

webmaster_email: admin@example.com

Below is the code to parse this file. Note that basic types like integers, strings, Booleans, and lists are specified using standard type annotations. Dictionaries with specific keys are represented by model classes, and it is possible to reference one model from within another. Pydantic also has built-in support for validating email addresses, which we can take advantage of here:

#!/usr/bin/env python3

import nestedtext as nt
from pydantic import BaseModel, EmailStr
from typing import List
from pprint import pprint

class Database(BaseModel):
    engine: str
    host: str
    port: int
    user: str

class Config(BaseModel):
    debug: bool
    secret_key: str
    allowed_hosts: List[str]
    database: Database
    webmaster_email: EmailStr

obj = nt.load('deploy.nt')
config = Config.parse_obj(obj)

pprint(config.dict())

This produces the following data structure:

{'allowed_hosts': ['www.example.com'],
 'database': {'engine': 'django.db.backends.mysql',
              'host': 'db.example.com',
              'port': 3306,
              'user': 'www'},
 'debug': False,
 'secret_key': 't=)40**y&883y9gdpuw%aiig+wtc033(ui@^1ur72w#zhw3_ch',
 'webmaster_email': 'admin@example.com'}

Validate with Voluptuous

This example shows how to use voluptuous to validate and parse a NestedText file and it demonstrates how to use the keymap argument from loads() or load() to add location information to Voluptuous error messages.

The input file is the same as in the previous example, i.e. deployment settings for a web server:

debug: false
secret_key: t=)40**y&883y9gdpuw%aiig+wtc033(ui@^1ur72w#zhw3_ch

allowed_hosts:
  - www.example.com

database:
  engine: django.db.backends.mysql
  host: db.example.com
  port: 3306
  user: www

webmaster_email: admin@example.com

Below is the code to parse this file. Note how the structure of the data is specified using basic Python objects. The Coerce() function is necessary to have voluptuous convert string input to the given type; otherwise it would simply check that the input matches the given type:

#!/usr/bin/env python3

import nestedtext as nt
from voluptuous import Schema, Coerce, MultipleInvalid
from inform import error, full_stop, terminate
from pprint import pprint

schema = Schema({
    'debug': Coerce(bool),
    'secret_key': str,
    'allowed_hosts': [str],
    'database': {
        'engine': str,
        'host': str,
        'port': Coerce(int),
        'user': str,
    },
    'webmaster_email': str,
})
try:
    keymap = {}
    raw = nt.load('deploy.nt', keymap=keymap)
    config = schema(raw)
except nt.NestedTextError as e:
    e.terminate()
except MultipleInvalid as e:
    for err in e.errors:
        kind = 'key' if 'key' in err.msg else 'value'
        loc = keymap[tuple(err.path)]
        error(full_stop(err.msg), culprit=err.path, codicil=loc.as_line(kind))
    terminate()

pprint(config)

This produces the same result as in the previous example.

JSON to NestedText

This example implements a command-line utility that converts a JSON file to NestedText. It demonstrates the use of dumps() and NestedTextError.

#!/usr/bin/env python3
"""
Read a JSON file and convert it to NestedText.

usage:
    json-to-nestedtext [options] [<filename>]

options:
    -f, --force            force overwrite of output file
    -i <n>, --indent <n>   number of spaces per indent [default: 4]
    -w <n>, --width <n>    desired maximum line width; specifying enables
                           use of single-line lists and dictionaries as long
                           as the fit in given width [default: 0]

If <filename> is not given, JSON input is taken from stdin and NestedText output 
is written to stdout.
"""

from docopt import docopt
from inform import done, fatal, full_stop, os_error, warn
from pathlib import Path
import json
import nestedtext as nt
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')

cmdline = docopt(__doc__)
input_filename = cmdline['<filename>']
try:
    indent = int(cmdline['--indent'])
except Exception:
    warn('expected positive integer for indent.', culprit=cmdline['--indent'])
    indent = 4
try:
    width = int(cmdline['--width'])
except Exception:
    warn('expected non-negative integer for width.', culprit=cmdline['--width'])
    width = 0

try:
    # read JSON content; from file or from stdin
    if input_filename:
        input_path = Path(input_filename)
        json_content = input_path.read_text(encoding='utf-8')
    else:
        json_content = sys.stdin.read()
    data = json.loads(json_content)

    # convert to NestedText
    nestedtext_content = nt.dumps(data, indent=indent, width=width) + "\n"

    # output NestedText content; to file or to stdout
    if input_filename:
        output_path = input_path.with_suffix('.nt')
        if output_path.exists():
            if not cmdline['--force']:
                fatal('file exists, use -f to force over-write.', culprit=output_path)
        output_path.write_text(nestedtext_content, encoding='utf-8')
    else:
        sys.stdout.write(nestedtext_content)

except OSError as e:
    fatal(os_error(e))
except nt.NestedTextError as e:
    e.terminate()
except KeyboardInterrupt:
    done()
except json.JSONDecodeError as e:
    # create a nice error message with surrounding context
    msg = e.msg
    culprit = input_filename
    codicil = None
    try:
        lineno = e.lineno
        culprit = (culprit, lineno)
        colno = e.colno
        lines_before = e.doc.split('\n')[lineno-2:lineno]
        lines = []
        for i, l in zip(range(lineno-len(lines_before), lineno), lines_before):
            lines.append(f'{i+1:>4}> {l}')
        lines_before = '\n'.join(lines)
        lines_after = e.doc.split('\n')[lineno:lineno+1]
        lines = []
        for i, l in zip(range(lineno, lineno + len(lines_after)), lines_after):
            lines.append(f'{i+1:>4}> {l}')
        lines_after = '\n'.join(lines)
        codicil = f"{lines_before}\n     {colno*' '}\n{lines_after}"
    except Exception:
        pass
    fatal(full_stop(msg), culprit=culprit, codicil=codicil)

Be aware that not all JSON data can be converted to NestedText, and in the conversion much of the type information is lost.

json-to-nestedtext can be used as a JSON pretty printer:

> json-to-nestedtext < fumiko.json
treasurer:
    name: Fumiko Purvis
    address:
        > 3636 Buffalo Ave
        > Topeka, Kansas 20692
    phone: 1-268-555-0280
    email: fumiko.purvis@hotmail.com
    additional roles:
        - accounting task force

NestedText to JSON

This example implements a command-line utility that converts a NestedText file to JSON. It demonstrates the use of load() and NestedTextError.

#!/usr/bin/env python3
"""
Read a NestedText file and convert it to JSON.

usage:
    nestedtext-to-json [options] [<filename>]

options:
    -f, --force   force overwrite of output file
    -d, --dedup   de-duplicate keys in dictionaries

If <filename> is not given, NestedText input is taken from stdin and JSON output 
is written to stdout.
"""

from docopt import docopt
from inform import done, fatal, os_error
from pathlib import Path
import json
import nestedtext as nt
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')


def de_dup(key, value, data, state):
    if key not in state:
        state[key] = 1
    state[key] += 1
    return f"{key}#{state[key]}"


cmdline = docopt(__doc__)
input_filename = cmdline['<filename>']
on_dup = de_dup if cmdline['--dedup'] else None

try:
    if input_filename:
        input_path = Path(input_filename)
        data = nt.load(input_path, top='any', on_dup=de_dup)
        json_content = json.dumps(data, indent=4, ensure_ascii=False)
        output_path = input_path.with_suffix('.json')
        if output_path.exists():
            if not cmdline['--force']:
                fatal('file exists, use -f to force over-write.', culprit=output_path)
        output_path.write_text(json_content, encoding='utf-8')
    else:
        data = nt.load(sys.stdin, top='any', on_dup=de_dup)
        json_content = json.dumps(data, indent=4, ensure_ascii=False)
        sys.stdout.write(json_content + '\n')
except OSError as e:
    fatal(os_error(e))
except nt.NestedTextError as e:
    e.terminate()
except KeyboardInterrupt:
    done()

CSV to NestedText

This example implements a command-line utility that converts a CSV file to NestedText. It demonstrates the use of the converters argument to dumps(), which is used to cull empty dictionary fields.

#!/usr/bin/env python3
"""
Read a CSV file and convert it to NestedText.

usage:
    csv-to-nestedtext [options] [<filename>]

options:
    -n, --names            first row contains column names
    -c, --cull             remove empty fields (only for --names)
    -f, --force            force overwrite of output file
    -i <n>, --indent <n>   number of spaces per indent [default: 4]

If <filename> is not given, csv input is taken from stdin and NestedText output 
is written to stdout.

If --names is specified, then the first line is assumed to hold the column/field 
names with the remaining lines containing the data.  In this case the output is 
a list of dictionaries.  Otherwise every line contains data and that data is 
output as a list of lists.
"""

from docopt import docopt
from inform import cull, done, fatal, full_stop, os_error, warn
from pathlib import Path
import csv
import nestedtext as nt
import sys
sys.stdin.reconfigure(encoding='utf-8')
sys.stdout.reconfigure(encoding='utf-8')

cmdline = docopt(__doc__)
input_filename = cmdline['<filename>']
try:
    indent = int(cmdline['--indent'])
except Exception:
    warn('expected positive integer for indent.', culprit=cmdline['--indent'])
    indent = 4

# strip dictionaries of empty fields if requested
converters = {dict: cull} if cmdline['--cull'] else {}

try:
    # read CSV content; from file or from stdin
    if input_filename:
        input_path = Path(input_filename)
        csv_content = input_path.read_text(encoding='utf-8')
    else:
        csv_content = sys.stdin.read()
    if cmdline['--names']:
        data = csv.DictReader(csv_content.splitlines())
    else:
        data = csv.reader(csv_content.splitlines())

    # convert to NestedText
    nt_content = nt.dumps(data, indent=indent, converters=converters) + "\n"

    # output NestedText content; to file or to stdout
    if input_filename:
        output_path = input_path.with_suffix('.nt')
        if output_path.exists():
            if not cmdline['--force']:
                fatal('file exists, use -f to force over-write.', culprit=output_path)
        output_path.write_text(nt_content, encoding='utf-8')
    else:
        sys.stdout.write(nt_content)

except OSError as e:
    fatal(os_error(e))
except nt.NestedTextError as e:
    e.terminate()
except csv.Error as e:
    fatal(full_stop(e), culprit=(input_filename, data.line_num))
except KeyboardInterrupt:
    done()

PyTest

This example highlights a PyTest package parametrize_from_file that allows you to neatly separate your test code from your test cases; the test cases being held in a NestedText file. Since test cases often contain code snippets, the ability of NestedText to hold arbitrary strings without the need for quoting or escaping results in very clean and simple test case specifications. Also, use of the eval function in the test code allows the fields in the test cases to be literal Python code.

The test cases:

# test_expr.nt
test_substitution:
  -
    given:   first  second
    search: ^\s*(\w+)\s*(\w+)\s*$
    replace: \2 \1
    expected: second first
  -
    given: 4 * 7
    search: ^\s*(\d+)\s*([-+*/])\s*(\d+)\s*$
    replace: \1 \3 \2
    expected: 4 7 *

test_expression:
  -
    given: 1 + 2
    expected: 3
  -
    given: "1" + "2"
    expected: "12"
  -
    given: pathlib.Path("/") / "tmp"
    expected: pathlib.Path("/tmp")

And the corresponding test code:

# test_misc.py
import parametrize_from_file
import re
import pathlib

@parametrize_from_file
def test_substitution(given, search, replace, expected):
    assert re.sub(search, replace, given) == expected

@parametrize_from_file
def test_expression(given, expected):
    assert eval(given) == eval(expected)

Pretty Printing

Besides being a readable file format, NestedText makes a reasonable display format for structured data. This example further simplifies the output by stripping leading multiline string tags.

>>> import nestedtext as nt
>>> import re
>>>
>>> def pp(data):
...     try:
...         text = nt.dumps(data, default=repr)
...         print(re.sub(r'^(\s*)[>:]\s?(.*)$', r'\1\2', text, flags=re.M))
...     except nt.NestedTextError as e:
...         e.report()

>>> addresses = nt.load('examples/address.nt')

>>> pp(addresses['Katheryn McDaniel'])
position: president
address:
    138 Almond Street
    Topeka, Kansas 20697
phone:
    cell: 1-210-555-5297
    home: 1-210-555-8470
email: KateMcD@aol.com
additional roles:
    - board member

Normalizing keys

With data files created by non-programmers it is often desirable to allow a certain amount of flexibility in the keys. For example, you may wish to ignore case and be tolerant of extra spacing. However, the end applications often needs the keys to be specific values. It is possible to normalize the keys using a schema, but this can interfere with error reporting. Imagine there is an error in the value associated with a set of keys, if the keys have been changed by the schema the keymap can no longer be used to convert the keys into a line number for an error message. NestedText provides the normalize_key argument to load() and loads() to address this issue. It allows you to pass in a function that normalizes the keys before the keymap is created, releasing the schema from that task.

The following contact look-up program demonstrates both the normalization of keys and the associated error reporting. In this case, the first level of keys contains the names of the contacts and should not be normalized. Keys at all other levels are considered keywords and so should be normalized.

#!/usr/bin/env python3
"""
Display Contact Information

Usage:
    contact <name>
"""

from docopt import docopt
from inform import codicil, display, error, full_stop, indent, os_error, terminate
import nestedtext as nt
from voluptuous import Schema, Required, Any, MultipleInvalid
import re

contacts_file = "address.nt"

def normalize_key(key, parent_keys):
    if len(parent_keys) == 0:
        return key
    return ' '.join(key.lower().split())

def render_contact(data):
    text = nt.dumps(data, default=repr)
    return (re.sub(r'^(\s*)[>:]\s?(.*)$', r'\1\2', text, flags=re.M))

cmdline = docopt(__doc__)
name = cmdline['<name>']

try:
    # define structure of contacts database
    contacts_schema = Schema({
        str: {
            'position': str,
            'address': str,
            'phone': Required(Any({str:str},str)),
            'email': Required(Any({str:str},str)),
            'additional roles': Any(list,str),
        }
    })

    # read contacts database
    contacts = contacts_schema(
        nt.load(
            contacts_file,
            top = 'dict',
            normalize_key = normalize_key,
            keymap = (keymap:={})
        )
    )

    # display requested contact
    for fullname, contact_info in contacts.items():
        if name in fullname.lower():
            display(fullname)
            display(indent(render_contact(contact_info)))

except nt.NestedTextError as e:
    e.report()
except MultipleInvalid as e:
    for err in e.errors:
        kind = 'key' if 'key' in err.msg else 'value'
        keys = tuple(err.path)
        codicil = keymap[keys].as_line(kind) if keys in keymap else None
        error(
            full_stop(err.msg),
            culprit = (contacts_file, nt.join_keys(keys, keymap=keymap)),
            codicil = codicil
        )
except OSError as e:
    error(os_error(e))
terminate()

This program takes a name as a command line argument and prints out the corresponding address. It uses the pretty print idea from the previous section to render the contact information. Voluptuous checks the validity of the contacts database, which is shown next. Notice the variability in the keys given in Fumiko’s entry:

# Contact information for our officers

Katheryn McDaniel:
    position: president
    address:
        > 138 Almond Street
        > Topeka, Kansas 20697
    phone:
        cell: 1-210-555-5297
        home: 1-210-555-8470
    email: KateMcD@aol.com
    additional roles:
        - board member

Margaret Hodge:
    position: vice president
    address:
        > 2586 Marigold Lane
        > Topeka, Kansas 20682
    phone: 1-470-555-0398
    email: margaret.hodge@ku.edu
    additional roles:
        - new membership task force
        - accounting task force

Fumiko Purvis:
    Position: treasurer
        # Fumiko's term is ending at the end of the year.
        # She will be replaced by Merrill Eldridge.
    Address:
        > 3636 Buffalo Ave
        > Topeka, Kansas 20692
    Phone: 1-268-555-0280
    EMail: fumiko.purvis@hotmail.com
    Additional  Roles:
        - accounting task force

Now, requesting Fumiko’s contact information gives:

Fumiko Purvis
    position: treasurer
    address:
        3636 Buffalo Ave
        Topeka, Kansas 20692
    phone: 1-268-555-0280
    email: fumiko.purvis@hotmail.com
    additional roles:
        - accounting task force

Notice that other than Fumiko’s name, the displayed keys are all normalized.

References

This example illustrates how one can implement references or macros in NestedText. A reference allows you to define some content once and insert that content multiple places in the document. This example also demonstrates a slightly different way to implement validation and conversion on a per field basis with voluptuous. Finally, it includes key normalization, which allows the keys to be case insensitive and contain white space even though the program that uses the data prefers the keys to be lower case identifiers. The normalize_key function passed to load() is used to transform the keys to the desired form.

PostMortem is a program that generates a packet of information that is securely shared with your dependents in case of your death. Only the settings processing part of the package is shown here. Here is a configuration file that Odin might use to generate packets for his wife and kids:

my GPG ids: odin@norse-gods.com
sign with: @ my gpg ids
name template: {name}-{now:YYMMDD}
estate docs:
    - ~/home/estate/trust.pdf
    - ~/home/estate/will.pdf
    - ~/home/estate/deed-valhalla.pdf

recipients:
    Frigg:
        email: frigg@norse-gods.com
        category: wife
        attach: @ estate docs
        networth: odin
    Thor:
        email: thor@norse-gods.com
        category: kids
        attach: @ estate docs
    Loki:
        email: loki@norse-gods.com
        category: kids
        attach: @ estate docs

Notice that estate docs is defined at the top level. It is not a PostMortem setting; it simply defines a value that will be interpolated into a setting later. The interpolation is done by specifying @ along with the name of the reference as a value. So for example, in recipients attach is specified as @ estate docs. This causes the list of estate documents to be used as attachments. The same thing is done in sign with, which interpolates my gpg ids.

Here is the code for validating and transforming the PostMortem settings:

#!/usr/bin/env python3

import nestedtext as nt
from pathlib import Path
from voluptuous import (
    Schema, Invalid, MultipleInvalid, Extra, Required, REMOVE_EXTRA
)
from pprint import pprint

# Settings schema
# First define some functions that are used for validation and coercion
def to_str(arg):
    if isinstance(arg, str):
        return arg
    raise Invalid('expected text')

def to_ident(arg):
    arg = to_str(arg)
    if arg.isidentifier():
        return arg
    raise Invalid('expected simple identifier')

def to_list(arg):
    if isinstance(arg, str):
        return arg.split()
    if isinstance(arg, dict):
        raise Invalid('expected list')
    return arg

def to_paths(arg):
    return [Path(p).expanduser() for p in to_list(arg)]

def to_email(arg):
    user, _, host = arg.partition('@')
    if '.' in host and '@' not in host:
        return arg
    raise Invalid('expected email address')

def to_emails(arg):
    return [to_email(e) for e in to_list(arg)]

def to_gpg_id(arg):
    try:
        return to_email(arg)      # gpg ID may be an email address
    except Invalid:
        try:
            int(arg, base=16)     # if not an email, it must be a hex key
            assert len(arg) >= 8  # at least 8 characters long
            return arg
        except (ValueError, AssertionError):
            raise Invalid('expected GPG id')

def to_gpg_ids(arg):
    return [to_gpg_id(i) for i in to_list(arg)]

def to_snake_case(key):
    return '_'.join(key.strip().lower().split())

# define the schema for the settings file
schema = Schema(
    {
        Required('my_gpg_ids'): to_gpg_ids,
        'sign with': to_gpg_id,
        'avendesora_gpg_passphrase_account': to_str,
        'avendesora_gpg_passphrase_field': to_str,
        'name template': to_str,
        Required('recipients'): {
            Extra: {
                Required('category'): to_ident,
                Required('email'): to_emails,
                'gpg_id': to_gpg_id,
                'attach': to_paths,
                'networth': to_ident,
            }
        },
    },
    extra = REMOVE_EXTRA
)

# this function implements references
def expand_settings(value):
    # allows macro values to be defined as a top-level setting.
    # allows macro reference to be found anywhere.
    if isinstance(value, str):
        value = value.strip()
        if value[:1] == '@':
            value = settings[to_snake_case(value[1:])]
        return value
    if isinstance(value, dict):
        return {k:expand_settings(v) for k, v in value.items()}
    if isinstance(value, list):
        return [expand_settings(v) for v in value]
    raise NotImplementedError(value)

def normalize_key(key, parent_keys):
    if parent_keys != ('recipients',):
        # normalize all keys except the recipient names
        return to_snake_case(key)
    return key

try:
    # Read settings
    config_filepath = Path('postmortem.nt')
    if config_filepath.exists():

        # load from file
        settings = nt.load(
            config_filepath,
            keymap = (keymap:={}),
            normalize_key = normalize_key
        )

        # expand references
        settings = expand_settings(settings)

        # check settings and transform to desired types
        settings = schema(settings)

        # show the resulting settings
        pprint(settings)

except nt.NestedTextError as e:
    e.report()
except MultipleInvalid as e:
    for err in e.errors:
        kind = 'key' if 'key' in err.msg else 'value'
        culprit = nt.join_keys(err.path, keymap=keymap)
        print(f"ERROR: {config_filepath!s}: {culprit}: {err.msg}.")
        try:
            print(keymap[tuple(err.path)].as_line(kind))
        except KeyError:
            pass
except OSError as e:
    print(f"ERROR: {config_filepath!s}: {e!s}")

This code uses expand_settings to implement references, and it uses the Voluptuous schema to clean and validate the settings and convert them to convenient forms. For example, the user could specify attach as a string or a list, and the members could use a leading ~ to signify a home directory. Applying to_paths in the schema converts whatever is specified to a list and converts each member to a pathlib path with the ~ properly expanded.

Notice that the schema is defined in a different manner that the above examples. In those, you simply state which type you are expecting for the value and you use the Coerce function to indicate that the value should be cast to that type if needed. In this example, simple functions are passed in that perform validation and coercion as needed. This is a more flexible approach and allows better control of the error messages.

Here are the processed settings:

{'my_gpg_ids': ['odin@norse-gods.com'],
 'recipients': {'Frigg': {'attach': [PosixPath('/home/ken/home/estate/trust.pdf'),
                                     PosixPath('/home/ken/home/estate/will.pdf'),
                                     PosixPath('/home/ken/home/estate/deed-valhalla.pdf')],
                          'category': 'wife',
                          'email': ['frigg@norse-gods.com'],
                          'networth': 'odin'},
                'Loki': {'attach': [PosixPath('/home/ken/home/estate/trust.pdf'),
                                    PosixPath('/home/ken/home/estate/will.pdf'),
                                    PosixPath('/home/ken/home/estate/deed-valhalla.pdf')],
                         'category': 'kids',
                         'email': ['loki@norse-gods.com']},
                 'Thor': {'attach': [PosixPath('/home/ken/home/estate/trust.pdf'),
                                     PosixPath('/home/ken/home/estate/will.pdf'),
                                     PosixPath('/home/ken/home/estate/deed-valhalla.pdf')],
                          'category': 'kids',
                          'email': ['thor@norse-gods.com']}}}