POKI_PUT_TOC_HERE
Specifically, I did simple experiments in several languages — Ruby, Python, Lua, Rust, Go, D. In one I just read lines and printed them back out — a line-oriented cat. In another I consumed input lines like x=1,y=2,z=3 one at a time, split them on commas and equals signs to populate hash maps, transformed them (e.g. remove the y field), and emitted them. Basically mlr cut -x -f y with DKVP format. I didn’t do anything fancy — just using each language’s getline, string-split, hashmap-put, etc. And nothing was as fast as C, so I used C. Here are the experiments I kept (I failed to keep the Lua code, for example): C cat, another C cat, D cat, Go cat, another Go cat, Rust cat, Nim cat, D cut, Go cut, Nim cut.
One of Go’s most powerful features is the ease with which it allows quick-to-code, error-free concurrency. Yet Miller, like most high-volume text-processing tools, spends most of its time obtaining and parsing input strings and negligible time doing all subsequent processing. Thus the absence of in-process multiprocessing is only a slight penalty in this particular application domain — parallelism here is more easily achieved by running multiple single-threaded processes, each handling its own input files, either on a single host or split across multiple hosts.
class MyClass {
private:
char* a;
public:
MyClass(char* a) {
this->a = strdup(a);
}
~MyClass() {
free(a);
}
int myMethod(char* b) {
return strlen(a) + strlen(b);
}
};
...
MyClass* myObj = new MyClass("hello");
int x = myObj->myMethod("world");
results in something like
void MyClass$constructorcharptr(MyClass* this, char* a) {
this->a = strdup(a);
}
void MyClass$destructor(MyClass* this) {
free(this->a);
}
int MyClass$myMethod(MyClass* this, char* b) {
return strlen(this->a) + strlen(b);
}
MyClass* myObj = MyClass$constructorcharptr("hello");
int x = MyClass$myMethod(myObj, "world");
It’s easy enough to imitate this: simply use the coding convention of
prepending the class name to all methods, and placing this-pointers as the first arguments to methods.
Miller uses precisely this approach. For example:
typedef struct _lrec_t {
...
} lrec_t;
// Constructors
lrec_t* lrec_csv_alloc(...) {
lrec_t* prec = malloc(sizeof(lrec_t);
...
prec->attribute = ...;
return prec;
}
lrec_t* lrec_dkvp_alloc(...) {
...
}
// Destructor
void lrec_free(lrec_t* prec) {
...
free(prec->attribute);
...
free(prec);
}
// Methods
int lrec_foo(lrec_t* prec, ...) {
return prec->...;
}
void lrec_bar(lrec_t* prec, ...) {
prec->...;
}
This implements the object-oriented principle of encapsulation.
#include <stdio.h>
#include <containers/lrec.h>
typedef lrec_t* reader_func_t(FILE* fp, void* pvstate, context_t* pctx);
typedef void reset_func_t(void* pvstate);
typedef void reader_free_func_t(void* pvstate);
typedef struct _reader_t {
void* pvstate;
reader_func_t* preader_func; // Interface method
reset_func_t* preset_func; // Interface method
reader_free_func_t* pfree_func; // Interface method
} reader_t;
A class implementing this interface might look like
// Attributes are private to this file
typedef struct _reader_csv_state_t {
...
} reader_csv_state_t;
// Implementation of interface methods. Marked static (file-scope) to not
// pollute the global namespace; exposed only via function pointers.
static lrec_t* reader_csv_func(FILE* input_stream, void* pvstate, context_t* pctx) {
reader_csv_state_t* pstate = pvstate;
... use various pstate->attributes ...
}
static void reset_csv_func(void* pvstate) {
reader_csv_state_t* pstate = pvstate;
... use various pstate->attributes ...
}
static void reader_csv_free(void* pvstate) {
... use various pstate->attributes ...
}
// Constructor
reader_t* reader_csv_alloc(...) {
reader_t* preader = mlr_malloc_or_die(sizeof(reader_t));
reader_csv_state_t* pstate = mlr_malloc_or_die(sizeof(reader_csv_state_t));
... set various pstate->attributes ...
preader->pvstate = (void*)pstate;
preader->preader_func = &reader_csv_func;
preader->preset_func = &reset_csv_func;
preader->pfree_func = &reader_csv_free;
return preader;
}
// Factory method
...
reader_t* preader = reader_csv_alloc(...);
...
// Method call
...
lrec_t* pinrec = preader->preader_func(input_stream, preader->pvstate, pctx);
...
This implements the object-oriented principles of polymorphism and
runtime binding.
More details are at
https://github.com/johnkerl/miller/tree/master/c/containers.