- [-] front-end separated
- [X] git repository
- [X] Makefile
- [X] files renamed
- [-] XSLT rewritten
- [X] dependencies shaken out
- [X] provisional solution for main dispatcher (route by RDF type)
- [X] replace hard-coded pseudo-RDF for vocabs with real RDF/XML
- [X] fudge with static file for now
- (we want this to ultimately be generated but currently missing the e.g. caching infrastructure to make that efficient)
- [X] fudge with static file for now
- [ ] new catalogue resolution mechanism
- [ ] handle potentially paginated inventories
- [ ] javascript reorganized
- [ ] decide where to hang all the app-specific scripts
- [ ] fix math on hyperbolic representation (deal with this last)
- [ ] (probably will need a whole new algorithm for final layout phase tbh)
- [-] create catalogue resources
- [ ] fix Intertwingler::Params
- [ ] create
configure
class method for parity with other subsystems that configure themselves out of the graph - [ ] create
refresh
method for templates, groups, and the registry itself- [ ] figure out why tf it’s
refresh
inIntertwingler::Params
andrefresh!
inParams::Registry
- [ ] whatever reason just standardize on one or the other
- [ ] figure out why tf it’s
- [ ] create mechanism for sets and sequences
- [ ] read out of RDF graph
- [ ] create a “term” parameter value type
- [ ] serializes/canonicalizes to CURIE (if possible) using internal prefix map
- [ ] expands to URI (again, if possible)
- [ ] create
- [X] create a mechanism for any resource to marshal Intertwingler::Params, not just transforms
- [X] create Intertwingler::Resource
- [X] these represent individual resources with stateful config
- [X] generic
call
method to route requests - [X] actual methods map to HTTP request methods
- [X] create Intertwingler::Resource
- [ ] aggregated vocab resource
- [ ] just RDF/XML for now
- [ ] figure out some compute-once situation for all vocabs because it takes for-eeever to extract them and then forever again to render 3 megabytes of RDF/XML
- (note it is mainly schema dot org that is responsible for this and we don’t even use it)
- [ ] maybe consider some kind of subsetting/parametrization?
- [X]
cgto:Index
resource- (points to different types of summary resources)
- [-]
cgto:Summary
resources- (counts of resources and links to inventories)
- [X] by class
- asserted/inferred, counts/links
- [ ] by property
- asserted/inferred, domain/range, counts/links
- [-]
cgto:Inventory
resource- (parametrized; used to enumerate actual resources in the graph)
- [X] update TFO vocab
- [X] default value
- [X]
tfo:default
property
- [X]
- [X] handling of empty values (ignore vs null, empty string, etc)
- [X]
tfo:empty
property - [X] put empty string in
tfo:default
if that’s what you want
- [X]
- [X]
tfo:universe
property - [X] some way to represent composite values
- [X]
tfo:Composite
class- [X]
tfo:element
property
- [X]
- [X] sequences (unbounded)
- [X] just put
rdf:List
as a composite
- [X] just put
- [X] tuples (fixed length)
- [X] shift vs truncate policy
- [X]
tfo:shift
property
- [X]
- [X] shift vs truncate policy
- [X] discrete/enumerated sets
- [X]
rdf:Bag
+rdfs:member
- [X]
- [X] numeric (or number-like, e.g. date) spans
- [X] bounded on either side?
- [X] include/exclude boundaries?
- [X]
tfo:Range
class- [X]
tfo:low
property - [X]
tfo:high
property - [X]
tfo:infimum
property - [X]
tfo:supremum
property
- [X]
- [X]
- [X] determine how to represent a “term” type
- [X] default value
- [ ] some way to assign parse/serialize, compose/decompose functions
- [ ] change parameter spec or generalize domain
- [ ] some kind of caching?
- [ ] pagination links
- [ ]
/me
resource- (e.g. “my”
sioc:UserAccount
which eventually hooks up tofoaf:Person
etc) - [ ] rework application state stuff so it is centered around the user
- (e.g. “my”
- [ ] fix Intertwingler::Params
Around early August (2023) I decided to go with a much more ambitious design that does things I wasn’t initially planning on doing. For example, I wasn’t initially planning on doing the whole transform infrastructure, but I think it will be a much more powerful product to have them than not to.
- This is proto-MVP.
- The engine is the thing that resolves URIs, picks content handlers, and pipes requests/responses through transforms.
- The engine also has residual responsibility for all errors and redirects.
- As for the work outstanding, we’re mostly talking about a data structure that is highly dependent on a huge schwack of configuration data.
- In the interest of shipping, I’m also just going to have it poll the handlers in the configured order, even though the long-term idea is to have it do something smarter than that.
- The main issue here is how do we represent the massive amount of configuration we need?
- The answer is the Intertwingler Configuration Vocabulary as well as the Transformation Functions Ontology.
- [ ] thinking of implementing this as a
configure
class method on each of the relevant classes- [ ] handle
Params::Registry
by making anIntertwingler
-specific subclass
- [ ] handle
- The issue is basically that TFO does a handy-dandy job of describing parameters (for the newly-minted
Params::Registry
), and relating them to what it calls “transforms” which are different from whatIntertwingler
calls a transform.- An
itcv:Transform
is a subclass ofitcv:Handler
, which can be thought of a container for at least one resource, while atfo:Transform
is equivalent to one of those resources contained as such, like an individual service endpoint. - What we want is to be able to specify
tfo:Parameter
entities and lists thereof to pass into the parameter registry, but the relations are too tight
- An
- I also don’t want TFO to depend on ITCV but ITCV can depend on TFO.
- Therefore:
- [ ] Make (or find) a suitable generic superclass for
tfo:Transform
that represents an individual service endpoint, and maketfo:Transform
rdfs:subClassOf
that. - [ ] Add the necessary classes/relations to make ITCV able to use
tfo:Parameter
declarations.- [ ] Create configuration language for the various handlers/transforms that need it:
- [ ] filesystem
- [ ] content-addressable store
- [ ] XSLT processing instruction transform
- Should we reuse
tfo:Parameter
on these too? probably.- This means the abstract parameter-having superclass is gonna need to subsume handlers and individual resources within handlers.
- Should we bootstrap the configuration for the graph database itself?
- like point the command line program to an initial config RDF which loads into the in-memory store, finds the config for the persistent store, spins that up, then disgorges its contents into it?
- Not sure yet.
- Note that
RDF::Repository
has subclasses that take arbitrary parameters- (we are initially interested in
RDF::LMDB
that hasdir
andmapsize
) - (should note that
Store::Digest
, at least the one driver I wrote, also uses LMDB, so it also needsdir
andmapsize
) - (the filesystem handler has to specify multiple directories in order so it’ll have to be a list or otherwise it’d reuse
dir
too)
- (we are initially interested in
- [ ] Create configuration language for the various handlers/transforms that need it:
- [ ] Make (or find) a suitable generic superclass for
- We’re gonna need a demo configuration after all.
- [ ] Write
configure
methods for the engine and handlers.
- I already have a few individual handlers and transforms running, now have to put them together.
- There are some ambiguities about how the resolver ought to behave that can’t be determined until the whole thing is online.
- In particular, how multiple path segments ought to be handled is unclear in the absence of
ci:canonical
.- set-theoretic like the old one? probably.
- (i.e., the
/
character is treated like an AND)
- (i.e., the
- do we nominate certain RDF classes as “containers” and/or certain properties as containment relations?
- more to the point, do we want to discount certain classes and properties from being interpreted as such?
- basic issue here is determining when to put a terminating
/
on the URL path: “containers” should get them, non-containers should not.
- set-theoretic like the old one? probably.
- Squashing to lowercase, also underscores to hyphens, etc
- I prefer hyphens over underscores but other people may not.
- also certain slugs may need to be preserved exactly.
- do we want to make that behaviour configurable?
- In particular, how multiple path segments ought to be handled is unclear in the absence of
- There is currently no code for passing HTTP requests or entire responses into transforms
- [ ] write
Intertwingler::Representation::HTTP
- [ ] write request-transform harness
- [ ] write queue injection/manipulation code
- In the interest of shipping, this should just poll the handlers in the order they were configured.
- We can come around later and do the fancy handler prioritization code (which is gonna depend on the handler manifest protocol).
- This actually works on the test bench.
- [ ] write response transform harness (likely very similar to request transform harness)
- I have broken the list of handlers and transforms into MVP versus not, irrespective of the workload.
- [ ]
Intertwingler::Handler::Generated
- This is the basic handler for HTML/XML markup which is generated exclusively from the graph. It is mainly intended to be a stopgap until a Loupe processor becomes viable.
- [ ] with tests
- [ ] with documentation
- [ ] Devise sub-handler configuration/loading mechanism
- [ ] Also determine sub-handler interface
- [ ] Core sub-handlers
- Most of these have already been written for
RDF::SAK
so like the markup trasnforms, it’s mainly a matter of repackaging them. - [ ] Generic (X)HTML+RDFa
- This will spit out a simple document centred around a subject in the graph, plus resources (and their labels) and literals adjacent to it, including blank nodes. The goal of this thing is to provide you with LEGO pieces to be composed at the network level downstream.
- [ ] with tests
- [ ] with documentation
- [ ] Atom feed
- This will take
GET
requests to container-like resources and return responses inapplication/atom+xml
. - [ ] with tests
- [ ] with documentation
- This will take
- [ ] Google site map
- This repackages lists of resources
Intertwingler
recognizes as “documents” into something Google can consume. It’s mainly here because it was inRDF::SAK
and because it’s easy. A later version will probably be implemented as a transform over handler manifests. - [ ] with tests
- [ ] with documentation
- This repackages lists of resources
- [ ] Data Cube
- This one will take a
qb:DataSet
,qb:Slice
, orqb:ObservationGroup
and generate an HTML table. - [ ] with tests
- [ ] with documentation
- This one will take a
- Alphabetic lists
- These all follow the same pattern of just a long aphabetized list punctuated by initial-letter sections. Under the hood it’s mostly the same code.
- I18N/L10N is an issue here that I am totally punting on for the time being.
- [ ] SKOS concept scheme/collection
- This is a simple list broken into alphabetic buckets to handle
skos:ConceptScheme
andskos:Collection
entities. - [ ] with tests
- [ ] with documentation
- This is a simple list broken into alphabetic buckets to handle
- [ ] Bibliography
- This handler continues the alphabetic list tradition for bibliographic references.
- [ ] with tests
- [ ] with documentation
- [ ] Person/organization list
- Alphabetic list hat trick for
foaf:Person
andorg:Organization
, etc. - [ ] with tests
- [ ] with documentation
- Alphabetic list hat trick for
- These all follow the same pattern of just a long aphabetized list punctuated by initial-letter sections. Under the hood it’s mostly the same code.
- Interactive UI materials
- These sub-handlers are intended to provide raw materials for creating user interfaces, particularly where data entry is involved.
- (These are the only sub-handlers that need to be written from scratch, but they are dead simple.)
- [ ] All classes
- This will list all RDF classes known to
Intertwingler
. - [ ] with tests
- [ ] with documentation
- This will list all RDF classes known to
- [ ] Adjacent properties (to subject)
- This will list all properties which are adjacent to a given class, or the class(es) of the subject. Can specify the direction, either
rdfs:domain
orrdfs:range
. - [ ] with tests
- [ ] with documentation
- This will list all properties which are adjacent to a given class, or the class(es) of the subject. Can specify the direction, either
- [ ] Adjacent class instances (to property)
- This will list all instances of classes which are adjacent to a given property.
- [ ] with tests
- [ ] with documentation
- These sub-handlers are intended to provide raw materials for creating user interfaces, particularly where data entry is involved.
- Most of these have already been written for
- [ ]
Intertwingler::Handler::CAS
- This is a front end to
Store::Digest::HTTP
(itself a front end toStore::Digest
), a content-addresable store that registers blobs under multiple cryptographic digests at once, using RFC6920 addresses. - [ ] with tests
- [ ] with documentation
- [ ]
/.well-known/ni/
handlesPOST
requests- [ ] responds with redirect, either
201 Created
or303 See Other
- [ ] responds with redirect, either
- This is a front end to
- [-]
Intertwingler::Handler::FileSystem
- This is a simple content-negotiating file system handler, mainly intended to smooth the transition to content-addressable storage.
- [ ] with tests
- [ ] with documentation
- [-] handles multiple document roots
- [X] does not venture outside of them
- [ ] skips dotfiles
- [X] configurable index basename
- [X] does content negotiation
- [X] treats
slug
(file) first andslug/
(dir) second
- [X] treats
- [ ]
Intertwingler::Handler::LDPatch
- This thing only responds to
PATCH
requests withtext/ldpatch
bodies. Meant to be used in conjunction with the RDF-KV transform. - [ ] with tests
- [ ] with documentation
- This thing only responds to
- [ ]
Intertwingler::Representation
- This is the monad-like thing that keeps a parsed version of an HTTP message body around so you can pass it through multiple transforms without having to waste resources serializing and reparsing it.
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Representation::Nokogiri
- This one handles XML/(X)HTML by parsing it with Nokogiri.
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Representation::Vips
- This one handles raster images by parsing them with Vips.
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Representation::Rack
- This one handles
message/http
bodies by parsing/serializingRack::Request
andRack::Response
objects. - [ ] with tests
- [ ] with documentation
- This one handles
- [ ]
Intertwingler::Transform
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Transform::Markup
- Most of these have already been written and the work is in refactoring them into transforms.
- [ ] with tests
- [ ] with documentation
- [ ] HTML ↔ XHTML transform
- [ ] with tests
- [ ] with documentation
- [ ] Strip comments transform
- [ ] with tests
- [ ] with documentation
- [ ] Rewrite
<head>
transform- [ ] with tests
- [ ] with documentation
- [ ] Rehydrate transform
- [ ] with tests
- [ ] with documentation
- [ ] Add social media metadata transform
- [ ] with tests
- [ ] with documentation
- [ ] Add backlinks transform
- [ ] with tests
- [ ] with documentation
- [ ] Rewrite links transform
- [ ] with tests
- [ ] with documentation
- [ ] Mangle
mailto:
transform- [ ] with tests
- [ ] with documentation
- [ ] Amazon tag transform
- [ ] with tests
- [ ] with documentation
- [ ] Normalize RDFa prefixes transform
- [ ] with tests
- [ ] with documentation
- [ ] Add
xml-stylesheet
PI transform- [ ] with tests
- [ ] with documentation
- [ ] Apply XSLT transform
- [ ] with tests
- [ ] with documentation
- [ ] Reindent transform
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Transform::Raster
- [ ] with tests
- [ ] with documentation
- [ ] Conversion transform
- [ ] converts from one image file format to another; does nothing else
- [ ] with tests
- [ ] with documentation
- [ ] Crop transform
- [ ] with tests
- [ ] with documentation
- [ ] Scale transform
- [ ] with tests
- [ ] with documentation
- [ ] Desaturate transform
- [ ] with tests
- [ ] with documentation
- [ ] Posterize transform
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Transform::Markdown
- [ ] with tests
- [ ] with documentation
- [ ] Markdown hook transform
- [ ] with tests
- [ ] with documentation
- [ ] add
text/markdown
toAccept
- [ ] hook the actual transform
- [ ] Markdown → (X)HTML transform
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Transform::Sass
- This is potentially our first candidate for stand-alone transform, since all Sass development has moved to Dart and is literally the only thing I know that has. Until then, we use the old Ruby Sass I guess (or maaaybe libsass bindings? No updates in years though.)
- [ ] with tests
- [ ] with documentation
- [ ] Sass hook transform
- This request transform makes it possible for downstream content negotiation to select Sass representations.
- [ ] with tests
- [ ] with documentation
- [ ] add
text/x-vnd.sass
andtext/x-vnd.sass.scss
toAccept
- [ ] Sass transform
- This will take a Sass document and turn it into CSS.
- [ ] with tests
- [ ] with documentation
- [ ] Sass internal loader can fetch other Sass via subrequest
- [ ]
Intertwingler::Transform::Input
- There is nothing especially appropriate about lumping these resources together other than they are the only ones necessary for MVP that actually process input.
- [ ] with tests
- [ ] with documentation
- [ ] Pseudo-file
PUT
transform- This will take a
PUT
request to an arbitrary resource and transform it into aPOST
to/.well-known/ni/
(controlled byStore::Digest
), but only after recording the pseudo-file’s pseudo-path in the graph.- I have been thinking about how to do this one more transactionally, since the content-addressable store is a separate module and not 100% guaranteed to be reliable.
- Rather than crud up the graph with fake file references to nothing, maybe have the request handler install a response handler that takes the
201 Created
with the redirect (the ordinary behaviour ofStore::Digest::HTTP
when youPOST
to/.well-known/ni/
), have it rewrite that response (or at least theLocation:
header), and in the process, glean the hash from the response (/.well-known/ni/sha-256/whatever…
) and in the process of attaching
- Rather than crud up the graph with fake file references to nothing, maybe have the request handler install a response handler that takes the
- I have been thinking about how to do this one more transactionally, since the content-addressable store is a separate module and not 100% guaranteed to be reliable.
- [ ] with tests
- [ ] with documentation
- This will take a
- [ ] RDF-KV transform
- It really just has to spin up the
Rack
app at this stage.- [ ] (as a stand-alone server or FastCGI or SCGI or whatever.)
- However, the CLI currently uses ~Commander~ and I would rather use ~Thor~ and ~TTY~ because I encountered some weird bugs with
Commander
in the past and those guys look way better organized. - One thing
Commmander
does do though is interactive shells with command completion, where you have access to the repertoire of commands inside the shell with all the parsing - Also, TTY finally has a pure-Ruby command completion working, which means no dependency on readline or whatever.
- The only caveat is that I don’t know how to expose the menu of
Thor
commands to a shell. Therefore:- [ ] Research how (if) this can be done.
- However, the CLI currently uses ~Commander~ and I would rather use ~Thor~ and ~TTY~ because I encountered some weird bugs with
- [ ] (as a stand-alone server or FastCGI or SCGI or whatever.)
- Certain people have asked for one.
- [ ] make it so the state directory is a volume so you can get at it from outside the container.
- This would bring
Intertwingler
back to parity with the oldRDF::SAK
. - [ ] just start up the engine in a sandbox, obtain its manifest (via
OPTIONS \*
), thenGET
everything that isGET
-able, and save that to a directory. - [ ] push out the rewrite maps and whatever else.
- Running transformations for responses that can otherwise cache is going to suck performance-wise.
- Solution: use the content-addressable store for cache like I originally intended.
- Problem: the cache is gonna get really big, really fast.
- Solution: An LRU policy or better.
- Problem: if you mix persistent storage in the same store with cache and happen to lose the handle on the former, you aren’t gonna know what’s cache and what isn’t.
- Solution: if
Store::Digest
knew an object was cache, nothing else would have to keep track of it.- Problem: if you insert something that has the same hash that you want to be permanent
- Solution: if an object is reinserted with the cache flag off, it should be impossible to flip on again without deleting the object and reinserting it (
Store::Digest
has a distinction between “merely” deleting an object while preserving its metadata and “forgetting” it ever existed, but merely deleting should be satisfactory).
- Solution: if an object is reinserted with the cache flag off, it should be impossible to flip on again without deleting the object and reinserting it (
- Problem: if you insert something that has the same hash that you want to be permanent
- Problem: adding a
cache
flag means changing the record layout for the metadata, which means anybody usingStore::Digest
is gonna have to upgrade.- (this may not be a problem since nobody uses it anyway.)
- However,
Store::Digest
does some dumb stuff by using the canonical digest algorithm as the key, when all it needs is a 64-bit integer. so not only does it waste space, it makes things more complicated. Therefore:- [ ] Overhaul the metadata so it uses integers as keys and the “main” hash algorithm (a concept which is still necessary for resolving the filenames in bulk storage) doesn’t have special status in the metadata database.
- We may as well add the caching infrastructure itself to the thing while we’re at it.
- [ ] new field (I think?) in the metadata: last-access time
- [ ] new initialization parameter: cache size
- [ ] write the cache expiration algorithm; hook it to a retrieval event
- make a new table in the key-value database that maps atime as a non-unique key to a record containing pk and size
- the main record will have the old atime so a full scan won’t be necessary to delete the old record in this lookup table
- delete the old record and insert one with the new atime
- (set the initial atime to the insertion time)
- delete the old record and insert one with the new atime
- scan through this table from newest to oldest, tallying up the sizes.
- when you cross the capacity line, start deleting.
- (there is probably a smarter way to do this.)
- the main record will have the old atime so a full scan won’t be necessary to delete the old record in this lookup table
- make a new table in the key-value database that maps atime as a non-unique key to a record containing pk and size
- Are we gonna want to record statistics about thrashing? probably but not right away.
- Ordinary cache statistics (like hit/miss rate) are not meaningful in
Store::Digest
because hit/miss against what?- You get a cached value in lieu of something else but all requests to
Store::Digest
are directly to hashes, so it doesn’t know what it’s caching, it only knows that a particular object is considered (by some other system) to be cache. - That said, knowing that certain objects are regularly getting deleted and reinserted (by the cache expiration policy, that is) is an indication that the cache is too small.
- You get a cached value in lieu of something else but all requests to
- Ordinary cache statistics (like hit/miss rate) are not meaningful in
- Are we gonna want logging? uggghghgh
- inclined to say maybe someday but not critical for
Intertwingler
- inclined to say maybe someday but not critical for
- What about
Store::Digest::HTTP
, the Web front-end?- [ ] Maybe make it more like an
Intertwingler
handler, or otherwise make a subclass of it in theIntertwingler
namespace. - There are some improvements that can be made to the index pages, but they aren’t critical for shipping
Intertwingler
.
- [ ] Maybe make it more like an
- Polling the handlers until one returns something other than 404 (or 405) is a pretty inefficient strategy and it would be good to do something smarter than that.
- In order to do something smarter though we need to know the sets of resources each handler has and what request methods they respond to.
- This is what the handler’s manifest is supposed to advertise.
- (In some cases an entire handler may only respond a subset of request methods. Transforms for instance are only supposed to respond to
POST
. If we knew up front that no resource within a handler never responded to the request’s method, we could rule it out with minimal processing.)
- The idea for the handler manifest protocol is calling
OPTIONS \*
on the handler withPrefer: return=representation
will disgorge the handler’s manifest, which is a list of all URIs it knows it has. Therefore:- [ ] come up with the manifest format,
- [ ] implement as much plumbing as is reasonable in the
Intertwingler::Handler
base class.
- These are handlers that aren’t strictly necessary for an MVP and/or may be a lot of effort
- While not strictly necessary for an MVP, a proxy handler would be necessary for the ultimate goal of making
Intertwingler
a layered system. - Making it so anybody can access anything on the internet is also problematic, so some kind of access control will need to be in place before it could go live, even if rudimentary.
- I mean, the backend is RDF; it should probably have one, right?
- That said, SPARQL is an excellent ready-made vector for a denial-of-service attack, to say nothing of security over the content of the graph.
- You could make one in an afternoon if you didn’t have to think about this, but I’d rather solve for capability-based access control first.
- Many of the markup transforms are going to be important for MVP, but we only need crop and resize image transforms for now.
- [ ]
Intertwingler::Transform::Raster
- These aren’t currently used by anything but they would unambiguously be useful.
- [ ] Flip transform
- Flip is easy enough to implement but to be quite honest I can never remember which flip is which. Like is a horizontal flip a flip about the horizontal axis, ie a flip upside down, or is it a flip that is like a mirror? (ie a flip about the vertical axis).
- (also a flip on both axes equals a rotate by a half-turn, and we have no way of expressing that currently.)
- Inclined to call
flip
upside down andmirror
for, well, mirror.
- [ ] with tests
- [ ] with documentation
- Flip is easy enough to implement but to be quite honest I can never remember which flip is which. Like is a horizontal flip a flip about the horizontal axis, ie a flip upside down, or is it a flip that is like a mirror? (ie a flip about the vertical axis).
- [ ] Rotate transform
- 90-degree rotate is a completely different beast than arbitrary rotate, but it doesn’t make sense to have two different rotates.
- Non-90-degree rotate will have to insist on an output format with an alpha channel, like PNG.
- Rotate about the centre and then resize to the bounding box; leave the corners transparent.
- you can tee up the crop transform after this.
- (I know it’s inefficient to calculate an alpha channel just to throw it away but this’ll eventually get run once and cached.)
- Non-90-degree rotate will have to insist on an output format with an alpha channel, like PNG.
- [ ] with tests
- [ ] with documentation
- 90-degree rotate is a completely different beast than arbitrary rotate, but it doesn’t make sense to have two different rotates.
- [ ] Knockout transform
- The idea behind knockout is you can knock out a monochromatic border of an image and get just the subject floating in the middle.
- I put this here cause I wanted it but this will actually be kind of tough to implement.
- unless (even if) I can find a decent smart masking algorithm somewhere, this is way more effort than just wrapping a stock library function.
- [ ] with tests
- [ ] with documentation
- The idea behind knockout is you can knock out a monochromatic border of an image and get just the subject floating in the middle.
- [ ] Brightness transform
- Like Photoshop brightness.
- [ ] with tests
- [ ] with documentation
- [ ] Contrast transform
- Like Photoshop contrast.
- [ ] with tests
- [ ] with documentation
- [ ] Gamma transform
- I dunno if I want to mess with this but it’ll probably be easy and I feel like I should.
- [ ] with tests
- [ ] with documentation
- [ ]
Intertwingler::Transform::Tidy
- This is a simple one; it just has a single resource that runs ~tidy~ (or rather it’s an interface to
libtidy
). Sincetidy
converses in byte streams, it isn’t appropriate to lump it in with the other markup transform that operates over parsed Nokogiri (libxml
) instances. - [ ] with tests
- [ ] with documentation
- This is a simple one; it just has a single resource that runs ~tidy~ (or rather it’s an interface to
- [ ]
Intertwingler::Transform::RDF
- This is a handy transformer between different RDF serialization formats. Again it’s not strictly necessary for MVP, but it will be useful in particular for content negotiation on resources that ust spit out one kind of RDF (including RDFa). This is also super straightforward except for JSON-LD, which is going to require more thinking. (A naïve conversion to JSON-LD is of course easy but JSON-LD has lots of features like contexts and framing that will need design attention.)
- [ ] with tests
- [ ] with documentation
- [ ] XXX what about RDF-star?
- [ ] Triples
- [ ] N-Triples target
- [ ] Turtle target
- [ ] RDF/XML target
- [ ] Quads
- [ ] NQuads target
- [ ] TriG target
- [ ] JSON-LD target
- [ ] XXX do we try to do contexts???
- [ ] expand/contract/framing??
- A scraper/crawler is necessary for fetching things like link previews and scoping out referrers, but could also do things like fetch RSS feeds or other chores.
- There is already a stub scraper/crawler in the source tree but it needs some love.
- We want to be able to do something like call
intertwingler shell
or justintertwingler
with no arguments and it loads up a shell.- We want all the commands that you can do on the command line to also be accessible within the shell.
Loupe processor
- Loupe is a planned vocabulary for making markup documents out of RDF by dictating the following:
- [ ] predicate order
- [ ] predicate show/hide
- note “hide” can mean invisible but present vs completely omitted from the representation
- gut says “completely omit from representation” should happen at the data source level, ie the processor does not have access to see what it should be omitting from the representation
- [ ] value order
- [ ] value show/hide
- [ ] label determination
- [ ] value disposition
- [ ] resources
- [ ] link
- [ ] embed (image, video, audio, iframe, object, script)
- [ ] inline (fragment)
- [ ] literals
- [ ] block
- [ ] inline
- [ ] merged
- [ ] alternates
- [ ] resources
- [ ] element selection
- [ ] block (section, div, paragraph, figure, etc)
- [ ] list (ol, ul, dl)
- note
rdf:List
treatment as well
- note
- [ ] serialize to (X)HTML+RDFa
- [ ] serialize to JSON-LD (?)
- [ ] command-line tool that can:
- [ ] spawn a web server
- [ ] that resolves URIs
- [ ] that appropriately does redirects
- [ ] that resolves 410s (gone)
- [ ] that resolves 300s (multiple choices)
- [ ] that does content negotiation where applicable
- [ ] that generates (X)HTML with all the trimmings
- [ ] that applies transformation functions to whatever is thrown at it (modulo mime type compatibility)
- [ ] that resolves URIs
- [ ] spawn a scraper/crawler
- [ ] that traces redirects
- [ ] that is smart enough to recognize loops
- [ ] that can either resolve a given list or follow links
- [ ] that stores content in the content-addressable store
- [ ] that returns an rdf graph of the metadata
- [ ] that traces redirects
- [ ] spawn a shell
- [ ] that can view and edit THE rdf graph
- [ ] with term completion
- [ ] with shortcuts for certain vocabs
- [ ] with commands for common bulk rdf operations
- [ ] that can view and edit THE rdf graph
- [ ] spawn a web server
- [ ] Create an
Intertwingler::Config
configuration file parser - [ ] Main
Intertwingler
namespace has a convenience function for loading anIntertwingler::Engine
instance from a config file
- [ ] with tests
- [ ] with documentation
- [ ] Loads configuration
- [ ] handles multiple authorities (host names + aliases)
- [ ] optionally shares RDF store but optionally doesn’t
- [ ] Central dispatcher
- [ ] Figure out how
OPTIONS *
manifests are going to work- [ ] actually make them
- [ ] Figure out how
- [ ] Some facility for routing to meaningful error messages
- [ ] Handles
410 Gone
- [ ] Handles
300 Multiple Choices
- [ ]
Intertwingler::Handler::Proxy
- [ ] with tests
- [ ] with documentation
- [ ]
Prefer: respond-async
andwait=N
- [ ]
GET
every URL in the manifest, save it out to the file system- [ ] depends on figuring out manifests
- [ ] write out rewrite maps
- [ ] include documentation for configuring Apache
- [ ] nginx, IIS too?? (can they even do conneg?)
- [ ] general cleanup
- [ ] tests
- [ ] documentation
- [ ] rename
URLRunner
toCrawler
- [ ] general cleanup
- [ ] tests
- [ ] documentation
- [ ] general cleanup
- [ ] tests
- [ ] documentation
- [ ] actually finish this
- [ ] tests
- [ ] documentation
- [ ] spawn engine
- [ ] HTTP
- [ ] FastCGI
- [ ] option to use UNIX socket
- [ ] load RDF graph
- [ ] dump RDF graph to syntax of choice
- [ ] load file(s) into content-addressable store
- [ ] crawl external links
- [ ] batch-run document stats
- [ ] batch-run NLP scan
- [ ] disgorge data to JSON(-LD?)/CSV
- [ ] all batch commands also available in shell
- [ ] tab completion
- [ ] RDF data entry (Turtle with tab completion)
- [ ] run SPARQL queries (also with tab completion)
- [ ] output to CSV or RDF
- [ ] Eliminate
Intertwingler::Context
andIntertwingler::Context::Document
- [ ] Eliminate
Intertwingler::Source
andIntertwingler::Surface
- [ ] Eliminate old junk from
Intertwingler::Transform
- [ ] Eliminate
Intertwingler::Console
- [ ] Eliminate
Intertwingler::Util::Messy
- [ ] Merge
Intertwingler::Util::Clean
intoIntertwingler::Util
and eliminate all explicit references to it
- [ ] Merge
- [ ] Installation guide
- [ ] Sample configurations
- [ ] Download and install materials
- [ ] Docker image
- [ ] MimeMagic
- [ ] Rack