Alternatives¶
There are no shortage of well established alternatives to NestedText for storing data in a human-readable text file. The features and shortcomings of some of these alternatives are discussed next. NestedText is intended to be used in situations where people either create, modify, or consume the data directly. It is this perspective that informs these comparisons.
JSON¶
JSON is a subset of JavaScript suitable for holding data. Like NestedText, it consists of a hierarchical collection of objects (dictionaries), lists, and strings, but also allows reals, integers, Booleans and nulls. In practice, JSON is largely generated and consumed by machines. The data is stored as text, and so can be read, modified, and consumed directly by the end user, but the format is not optimized for this use case and so is often cumbersome or inefficient when used in this manner.
JSON supports all the native data types common to most languages. Syntax is
added to values to unambiguously indicate their type. For example, 2
,
2.0
, and "2"
are three different values with three different types
(integer, real, string). This adds two types of complexity. First, the rules
for distinguishing various types must be learned and used. Second, all strings
must be quoted, and
with quoting comes escaping, which is needed to allow quote characters to be
included in strings.
JSON was derived as a subset of JavaScript, and so inherits a fair amount of syntactic clutter that can be annoying for users to enter and maintain. In addition, features that would improve clarity are lacking. Comments are not allowed, multiline strings are not supported, and whitespace is insignificant (leading to the possibility that the appearance of the data may not match its true structure).
NestedText only supports three data types (strings, lists and dictionaries) and does not have the baggage of being the subset of a general purpose programming language. The result is a simpler language that has the following clear advantages over JSON as a human readable and writable data file format:
strings do not require quotes
comments
multiline strings
no need to escape special characters
commas are not used to separate dictionary and list items
The following examples illustrate the difference between JSON and NestedText:
JSON:
{ "treasurer": { "name": "Fumiko Purvis", "address": "3636 Buffalo Ave\nTopeka, Kansas 20692", "phone": "1-268-555-0280", "email": "fumiko.purvis@hotmail.com", "additional roles": [ "accounting task force" ] } }
NestedText:
treasurer: name: Fumiko Purvis # Fumiko's term is ending at the end of the year. # She will be replaced by Merrill Eldridge. address: > 3636 Buffalo Ave > Topeka, Kansas 20692 phone: 1-268-555-0280 email: fumiko.purvis@hotmail.com additional roles: - accounting task force
YAML¶
YAML is considered by many to be a human friendly alternative to JSON. There is less syntactic clutter and the quoting of strings is optional. However, it also supports a wide variety of data types and formats. The optional quoting can result in the type of values being ambiguous. To distinguish between the various types, a complicated and non-intuitive set of rules developed. YAML at first appears very appealing when used with simple examples, but things can quickly become complicated or provide unexpected results. A reaction to this is the use of YAML subsets, such as StrictYAML. However, the subsets still try to maintain compatibility with YAML and so inherit much of its complexity. For example, both YAML and StrictYAML support nine different ways of writing multiline strings.
YAML avoids excessive quoting and supports comments and multiline strings, but the multitude of formats and disambiguation rules make YAML a difficult language to learn, and the ambiguities creates traps for the user. To illustrate these points, the following is a condensation of a YAML document taken from the GitHub documentation that describes host to configure continuous integration using Python:
YAML:
name: Python package on: [push] build: python-version: [3.6, 3.7, 3.8, 3.9, 3.10] steps: - name: Install dependencies run: | python -m pip install --upgrade pip pip install pytest if [ -f requirements.txt ]; then pip install -r requirements.txt; fi - name: Test with pytest run: | pytest
And here is the result of running that document through the YAML reader and writer. One might expect that the format might change a bit but that the information conveyed remains unchanged.
- YAML (round-trip):
name: Python package true: - push build: python-version: - 3.6 - 3.7 - 3.8 - 3.9 - 3.1 steps: - name: Install dependencies run: 'python -m pip install --upgrade pip pip install pytest if [ -f requirements.txt ]; then pip install -r requirements.txt; fi ' - name: Test with pytest run: 'pytest '
There are a few things to notice about this second version.
on
key was inappropriately converted totrue
.Python version
3.10
was inappropriately converted to3.1
.The multiline strings were converted to an even more obscure format.
Indentation is not an accurate reflection of nesting (notice that
python-version
and- 3.6
have the same indentation, but- 3.6
is contained insidepython-version
).
Now consider the NestedText version; it is simpler and not subject to misinterpretation.
NestedText:
name: Python package on: - push build: python-version: [3.6, 3.7, 3.8, 3.9, 3.10] steps: - name: Install dependencies run: > python -m pip install --upgrade pip > pip install pytest > if [ -f requirements.txt ]; then pip install -r requirements.txt; fi - name: Test with pytest run: pytest
NestedText was inspired by YAML, but eschews its complexity. It has the following clear advantages over YAML as a human readable and writable data file format:
simple
unambiguous (no implicit typing)
no unexpected conversions of the data
syntax is insensitive to special characters within text
safe, no risk of malicious code execution
TOML or INI¶
TOML is a configuration file format inspired by the well-known INI syntax. It supports a number of basic data types (notably including dates and times) using syntax that is more similar to JSON (explicit but verbose) than to YAML (succinct but confusing). As discussed previously, though, this makes it the responsibility of the user to specify the correct type for each field.
Another flaw in TOML is that it is difficult to specify deeply nested structures. The only way to specify a nested dictionary is to give the full key to that dictionary, relative to the root of the entire hierarchy. This is not much a problem if the hierarchy only has 1-2 levels, but any more than that and you find yourself typing the same long keys over and over. A corollary to this is that TOML-based configurations do not scale well: increases in complexity are often accompanied by disproportionate decreases in readability and writability.
Here is an example of a configuration file in TOML and NestedText:
TOML:
[plugins] auth = ['avendesora'] archive = ['ssh', 'gpg', 'avendesora', 'emborg', 'file'] publish = ['scp', 'mount'] [auth.avendesora] account = 'login' field = 'passcode' [archive.file] src = ['~/src/nfo/contacts'] [archive.avendesora] [archive.emborg] config = 'rsync' [publish.scp] host = ['backups'] remote_dir = 'archives/{date:YYMMDD}' [publish.mount] drive = '/mnt/secrets' remote_dir = 'sparekeys/{date:YYMMDD}'
NestedText:
plugins: auth: - avendesora archive: - ssh - gpg - avendesora - emborg - file publish: - scp - mount auth: avendesora: account: login field: passcode archive: file: src: - ~/src/nfo/contacts avendesora: {} emborg: config: rsync publish: scp: host: - backups remote_dir: archives/{date:YYMMDD} mount: drive: /mnt/secrets remote_dir: sparekeys/{date:YYMMDD}
NestedText has the following clear advantages over TOML and INI as a human readable and writable data file format:
text does not require quoting or escaping
data is left in its original form
indentation used to succinctly represent nested data
the structure of the file matches the structure of the data
heavily nested data is represented efficiently
CSV or TSV¶
CSV (comma-separated values) and the closely related TSV (tab-separated values) are exchange formats for tabular data. Tabular data consists of multiple records where each record is made up of a consistent set of fields. The format separates the records using line breaks and separates the fields using commas or tabs. Quoting and escaping is required when the fields contain line breaks or commas/tabs.
Here is an example data file in CSV and NestedText.
CSV:
Year,Agriculture,Architecture,Art and Performance,Biology,Business,Communications and Journalism,Computer Science,Education,Engineering,English,Foreign Languages,Health Professions,Math and Statistics,Physical Sciences,Psychology,Public Administration,Social Sciences and History 1970,4.22979798,11.92100539,59.7,29.08836297,9.064438975,35.3,13.6,74.53532758,0.8,65.57092343,73.8,77.1,38,13.8,44.4,68.4,36.8 1980,30.75938956,28.08038075,63.4,43.99925716,36.76572529,54.7,32.5,74.98103152,10.3,65.28413007,74.1,83.5,42.8,24.6,65.1,74.6,44.2 1990,32.70344407,40.82404662,62.6,50.81809432,47.20085084,60.8,29.4,78.86685859,14.1,66.92190193,71.2,83.9,47.3,31.6,72.6,77.6,45.1 2000,45.05776637,40.02358491,59.2,59.38985737,49.80361649,61.9,27.7,76.69214284,18.4,68.36599498,70.9,83.5,48.2,41,77.5,81.1,51.8 2010,48.73004227,42.06672091,61.3,59.01025521,48.75798769,62.5,17.6,79.61862451,17.2,67.92810557,69,85,43.1,40.2,77,81.7,49.3
NestedText:
- Year: 1970 Agriculture: 4.22979798 Architecture: 11.92100539 Art and Performance: 59.7 Biology: 29.08836297 Business: 9.064438975 Communications and Journalism: 35.3 Computer Science: 13.6 Education: 74.53532758 Engineering: 0.8 English: 65.57092343 Foreign Languages: 73.8 Health Professions: 77.1 Math and Statistics: 38 Physical Sciences: 13.8 Psychology: 44.4 Public Administration: 68.4 Social Sciences and History: 36.8 - Year: 1980 Agriculture: 30.75938956 Architecture: 28.08038075 Art and Performance: 63.4 Biology: 43.99925716 Business: 36.76572529 Communications and Journalism: 54.7 Computer Science: 32.5 Education: 74.98103152 Engineering: 10.3 English: 65.28413007 Foreign Languages: 74.1 Health Professions: 83.5 Math and Statistics: 42.8 Physical Sciences: 24.6 Psychology: 65.1 Public Administration: 74.6 Social Sciences and History: 44.2 - Year: 1990 Agriculture: 32.70344407 Architecture: 40.82404662 Art and Performance: 62.6 Biology: 50.81809432 Business: 47.20085084 Communications and Journalism: 60.8 Computer Science: 29.4 Education: 78.86685859 Engineering: 14.1 English: 66.92190193 Foreign Languages: 71.2 Health Professions: 83.9 Math and Statistics: 47.3 Physical Sciences: 31.6 Psychology: 72.6 Public Administration: 77.6 Social Sciences and History: 45.1 - Year: 2000 Agriculture: 45.05776637 Architecture: 40.02358491 Art and Performance: 59.2 Biology: 59.38985737 Business: 49.80361649 Communications and Journalism: 61.9 Computer Science: 27.7 Education: 76.69214284 Engineering: 18.4 English: 68.36599498 Foreign Languages: 70.9 Health Professions: 83.5 Math and Statistics: 48.2 Physical Sciences: 41 Psychology: 77.5 Public Administration: 81.1 Social Sciences and History: 51.8 - Year: 2010 Agriculture: 48.73004227 Architecture: 42.06672091 Art and Performance: 61.3 Biology: 59.01025521 Business: 48.75798769 Communications and Journalism: 62.5 Computer Science: 17.6 Education: 79.61862451 Engineering: 17.2 English: 67.92810557 Foreign Languages: 69 Health Professions: 85 Math and Statistics: 43.1 Physical Sciences: 40.2 Psychology: 77 Public Administration: 81.7 Social Sciences and History: 49.3
NestedText has the following clear advantages over CSV and TSV as a human readable and writable data file format:
text does not require quoting or escaping
arbitrary data hierarchies are supported
file representation tends to be tall and skinny rather than short and fat
easier to read