aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorPaul Duncan <pabs@pablotron.org>2019-02-05 16:37:11 -0500
committerPaul Duncan <pabs@pablotron.org>2019-02-05 16:37:11 -0500
commit524d4c9b225f185c90392f206992ea7e53cb4f1c (patch)
treecbdb1640303b415dcde0534bb44deccb14c0b123
parent61e10dccf3455e43f53489e3fae65344a1092f1c (diff)
downloadkmeans-524d4c9b225f185c90392f206992ea7e53cb4f1c.tar.bz2
kmeans-524d4c9b225f185c90392f206992ea7e53cb4f1c.zip
README.md: improve "Data File" section, add more benchmarks
-rw-r--r--README.md23
-rw-r--r--tests/bm.csv2
2 files changed, 14 insertions, 11 deletions
diff --git a/README.md b/README.md
index 4554d22..ea107cd 100644
--- a/README.md
+++ b/README.md
@@ -55,17 +55,18 @@ Data File Format
Reads and writes newline-delimited plain text files in the following
format:
-* Lines are delimited by newlines
-* Each line is a record.
-* Record fields are delimited by whitespace.
-* The first row specifies the *shape* of the remaining rows as two
- unsigned integers. The first unsigned integer -- `num_floats` --
- indicates the number of floating point columns per row, and the second
- unsigned integer -- `num_ints` -- indicates the number of signed
- integer values per row.
-* The remaining lines contain data rows. Each row consists of
- `num_floats` floating point numbers, followed by `num_ints` signed
- integer values.
+* Each line is a row.
+* Each row consists of one or more columns, delimited by a space.
+* Columns are floating point or integer values.
+* The first row is called the header row.
+* The header row contains two unsigned integer columns which indicate
+ the layout of the remaining rows.
+* The first header row column indicates the number of floating
+ point columns per row (`num_floats`).
+* The second header row column indicates the number of integer
+ columns per row (`num_ints`).
+* The remaining rows contain `num_floats` floating point
+ columns, followed by `num_ints` signed integer columns.
Example data file:
diff --git a/tests/bm.csv b/tests/bm.csv
index d6e2a19..c44e31a 100644
--- a/tests/bm.csv
+++ b/tests/bm.csv
@@ -2,4 +2,6 @@ file,real_time
c4-1e5-0.dat,0m30.959s
c4-1e5-0.dat,0m29.414s
c4-1e5-0.dat,0m28.033s
+c4-1e5-0.dat,0m19.917s
+c4-1e5-0.dat,0m21.160s
c9-1e6-0.dat,6m19.640s