cascalog.api documentation

<-

macro

(<- outvars & predicates)
Constructs a query or predicate macro from a list of
predicates. Predicate macros support destructuring of the input and
output variables.

?-

(?- & bindings)
Executes 1 or more queries and emits the results of each query to
the associated tap.

Syntax: (?- sink1 query1 sink2 query2 ...)  or (?- query-name sink1
query1 sink2 query2)

 If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.

?<-

macro

(?<- & args)
Helper that both defines and executes a query in a single call.

Syntax: (?<- out-tap out-vars & predicates) or (?<- "myflow"
out-tap out-vars & predicates) ; flow name must be a static string
within the ?<- form.

??-

(??- & args)
Executes one or more queries and returns a seq of seqs of tuples
 back, one for each subquery given.

Syntax: (??- query1 query2 ...) or (??- query-name query1 query2 ...)

If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.

??<-

macro

(??<- & args)
Like ??-, but for ?<-. Returns a seq of tuples.

INumOutFields

IOutputFields

ISelectFields

aggregatefn

macro

(aggregatefn & body)

aggregateop

bufferfn

macro

(bufferfn & body)

bufferiterfn

macro

(bufferiterfn & body)

bufferiterop

bufferop

cascalog-tap

(cascalog-tap source sink)

combine

(combine & [g & gens])
Merge the tuples from the subqueries together into a single
subquery. Doesn't ensure uniqueness of tuples.

compile-flow

(compile-flow & args)
Attaches output taps to some number of subqueries and creates a
Cascading flow. The flow can be executed with `.complete`, or
introspection can be done on the flow.

Syntax: (compile-flow sink1 query1 sink2 query2 ...)
or (compile-flow flow-name sink1 query1 sink2 query2)

 If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.

construct

(construct output-fields raw-predicates)
Parses predicates and output fields and returns a proper subquery.

cross-join

defaggregatefn

macro

(defaggregatefn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.

defaggregateop

macro

(defaggregateop sym__898__auto__ & body__899__auto__)

defbufferfn

macro

(defbufferfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.

defbufferiterfn

macro

(defbufferiterfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.

defbufferiterop

macro

(defbufferiterop sym__898__auto__ & body__899__auto__)

defbufferop

macro

(defbufferop sym__898__auto__ & body__899__auto__)

deffilterfn

macro

(deffilterfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.

deffilterop

macro

(deffilterop sym__898__auto__ & body__899__auto__)

defmain

macro

(defmain name & forms)
Defines an AOT-compiled function with the supplied
`name`. Containing namespace must be marked for AOT compilation to
have any effect.

defmapcatfn

macro

(defmapcatfn name doc-string? attr-map? [fn-args*] body)
Defines a mapcat operation.

defmapcatop

macro

(defmapcatop sym__898__auto__ & body__899__auto__)

defmapfn

macro

(defmapfn name doc-string? attr-map? [fn-args*] body)
Defines a map operation.

defmapop

macro

(defmapop sym__898__auto__ & body__899__auto__)

defparallelagg

macro

(defparallelagg name doc-string? attr-map? & {:keys [init-var combine-var present-var]})
Binds an efficient aggregator to the supplied symbol. A parallel
aggregator processes each tuple through an initializer function,
then combines the results each tuple's initialization until one
result is achieved. `defparallelagg` accepts two keyword arguments:

:init-var -- A var bound to a fn that accepts raw tuples and returns
an intermediate result; #'one, for example.

:combine-var -- a var bound to a fn that both accepts and returns
intermediate results.

For example,

(defparallelagg sum
:init-var #'identity
:combine-var #'+)

Used as

(sum ?x :> ?y)

defprepfn

macro

(defprepfn name doc-string? attr-map? [fn-args*] body)
Defines a prepared operation.

div

(div f & rest)
Perform floating point division on the arguments. Use this instead
of / in Cascalog queries since / produces Ratio types which aren't
serializable by Hadoop.

explain

(explain outfile query)(explain outfile sink-tap query)
Explains a query (by outputting a DOT file).

outfile  - String location for DOT file output.
sink-tap - Sink tap for query. Shows on query explanation. Defaults to stdout if omitted.
query    - Query to be explained.

Syntax: (explain outfile query)  or (explain outfile sink query)

Ex: (explain "outfile.dot" (<- [?a ?b] ([[1 2]] ?a ?b)))

filterfn

macro

(filterfn & body)

filterop

get-out-fields

(get-out-fields _)
Get the fields of a generator.

hfs-seqfile

(hfs-seqfile path & opts)
Creates a tap on HDFS using sequence file format. Different
 filesystems can be selected by using different prefixes for `path`.

Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/hfs-tap` for more keyword arguments.

 See http://www.cascading.org/javadoc/cascading/tap/Hfs.html and
 http://www.cascading.org/javadoc/cascading/scheme/SequenceFile.html

hfs-tap

(hfs-tap scheme path-or-file & {:keys [sinkmode sinkparts sink-template source-pattern templatefields], :or {templatefields Fields/ALL}})
Returns a Cascading Hfs tap with support for the supplied scheme,
opened up on the supplied path or file object. Supported keyword
options are:

`:sinkmode` - can be `:keep`, `:update` or `:replace`.

`:sinkparts` - used to constrain the segmentation of output files.

`:source-pattern` - Causes resulting tap to respond as a GlobHfs tap
when used as source.

`:sink-template` - Causes resulting tap to respond as a TemplateTap when
used as a sink.

`:templatefields` - When pattern is supplied via :sink-template,
this option allows a subset of output fields to be used in the
naming scheme.

See f.ex. the
http://docs.cascading.org/cascading/2.0/javadoc/cascading/scheme/local/TextDelimited.html
scheme.

hfs-textline

(hfs-textline path & opts)
Creates a tap on HDFS using textline format. Different filesystems
can be selected by using different prefixes for `path`. Supported
keyword options are:

`:outfields` - used to select the fields written to the tap

`:compression` - one of `:enable`, `:disable` or `:default`

See `cascalog.cascading.tap/hfs-tap` for more keyword arguments.

See http://www.cascading.org/javadoc/cascading/tap/Hfs.html and
http://www.cascading.org/javadoc/cascading/scheme/TextLine.html

lfs-seqfile

(lfs-seqfile path & opts)
Creates a tap that reads data off of the local filesystem in
 sequence file format.

Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/lfs-tap` for more keyword arguments.

 See http://www.cascading.org/javadoc/cascading/tap/Lfs.html and
 http://www.cascading.org/javadoc/cascading/scheme/SequenceFile.html

lfs-tap

(lfs-tap scheme path-or-file & {:keys [sinkmode sinkparts sink-template source-pattern templatefields], :or {templatefields Fields/ALL}})
Returns a Cascading Lfs tap with support for the supplied scheme,
opened up on the supplied path or file object. Supported keyword
options are:

`:sinkmode` - can be `:keep`, `:update` or `:replace`.

`:sinkparts` - used to constrain the segmentation of output files.

`:source-pattern` - Causes resulting tap to respond as a GlobHfs tap
when used as source.

`:sink-template` - Causes resulting tap to respond as a TemplateTap
when used as a sink.

`:templatefields` - When pattern is supplied via :sink-template,
this option allows a subset of output fields to be used in the
naming scheme.

lfs-textline

(lfs-textline path & opts)
Creates a tap on the local filesystem using textline format.

Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/lfs-tap` for more keyword arguments.

 See http://www.cascading.org/javadoc/cascading/tap/Lfs.html and
 http://www.cascading.org/javadoc/cascading/scheme/TextLine.html

mapcatfn

macro

(mapcatfn & body)

mapcatop

mapfn

macro

(mapfn & body)

mapop

memory-source-tap

(memory-source-tap tuples)(memory-source-tap fields-in tuples)

name-vars

(name-vars gen vars)

normalize-sink-connection

(normalize-sink-connection sink subquery)

num-out-fields

(num-out-fields _)

parallelagg

predmacro

macro

(predmacro & body)
A more general but more verbose way to create predicate macros.

Creates a function that takes in [invars outvars] and returns a
list of predicates. When making predicate macros this way, you must
create intermediate variables with gen-nullable-var(s). This is
because unlike the (<- [?a :> ?b] ...) way of doing pred macros,
Cascalog doesn't have a declaration for the inputs/outputs.

See https://github.com/nathanmarz/cascalog/wiki/Predicate-macros

predmacro*

(predmacro* fun)
Functional version of predmacro. See predmacro for details.

prepfn

macro

(prepfn args & body)
Defines a prepared operation. Pass in an argument vector of two
items and return either a function or a Map with two
keywords; :operate and :cleanup

select-fields

(select-fields gen fields)
Select fields of a named generator.

Example:
(<- [?a ?b ?sum]
    (+ ?a ?b :> ?sum)
    ((select-fields generator ["?a" "?b"]) ?a ?b))

sequence-file

(sequence-file field-names)

stdout

(stdout)
Creates a tap that prints tuples sunk to it to standard
output. Useful for experimentation in the REPL.

text-line

(text-line)(text-line field-names)(text-line source-fields sink-fields)(text-line source-fields sink-fields compression)

to-tail

(to-tail g & {:keys [fields]})

union

(union & gens)
Merge the tuples from the subqueries together into a single
subquery and ensure uniqueness of tuples.

with-job-conf

macro

(with-job-conf conf & body)
Modifies the job conf for queries executed within the form. Nested
with-job-conf calls will merge configuration maps together, with
innermost calls taking precedence on conflicting keys.

with-serializations

macro

(with-serializations serial-vec & forms)
Enables the supplied serializations for queries executed within the
form. Serializations should be provided as a vector of strings or
classes, like so:

(import 'org.apache.hadoop.io.serializer.JavaSerialization)
(with-serializations [JavaSerialization]
   (?<- ...))

Serializations nest; nested calls to with-serializations will merge
and unique with serializations currently specified by other calls to
`with-serializations` or `with-job-conf`.