clj-headlights.input-output

Tools for pipeline data input and output

absolute-path?

(absolute-path? resource-string)

drop-file-url-protocol

(drop-file-url-protocol resource-string)

file-url?

(file-url? resource-string)

gcs-url?

(gcs-url? resource-string)

multi-source

(multi-source pipeline name resource-strings)

Take collection of resource strings and return a composite transform which contains all those resources. If collection is empty, return an empty pcollection.

pubsub-input?

(pubsub-input? resource-string)

read-json-source

(read-json-source pcoll composite-name resource-string)

Inputs: [pcoll :- pcollections/PCollectionType composite-name :- s/Str resource-string :- s/Str]

Like resource-string->source, but maps elements from json-strings to objects.

resource-string->pcollection

(resource-string->pcollection pipeline name resource-string)

resource-string->source

(resource-string->source resource-string)

Construct a Dataflow source transform to read from a resource-string. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics / subscriptions

url->sink

(url->sink url)

Construct a Dataflow sink transform to write text to a url. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics

write-groups-to-partitioned-files

(write-groups-to-partitioned-files pipeline name destination suffix)

Inputs: [pipeline :- pcollections/PCollectionType name :- s/Str destination :- s/Str suffix :- s/Str] Returns: pcollections/PCollectionType

write-json-to-sink

(write-json-to-sink pcoll name url)

Like write-to-sink, but maps elements to json before.

write-to-sink

(write-to-sink pcoll name sink-url)

Construct a Dataflow transform to write text to a sink and apply it to a pcoll. Supported are: * Local files (file://) * GCS (gs://) * PubSub topics