cascalog.api documentation
<-
macro
(<- outvars & predicates)
Constructs a query or predicate macro from a list of
predicates. Predicate macros support destructuring of the input and
output variables.
?-
(?- & bindings)
Executes 1 or more queries and emits the results of each query to
the associated tap.
Syntax: (?- sink1 query1 sink2 query2 ...) or (?- query-name sink1
query1 sink2 query2)
If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.
?<-
macro
(?<- & args)
Helper that both defines and executes a query in a single call.
Syntax: (?<- out-tap out-vars & predicates) or (?<- "myflow"
out-tap out-vars & predicates) ; flow name must be a static string
within the ?<- form.
??-
(??- & args)
Executes one or more queries and returns a seq of seqs of tuples
back, one for each subquery given.
Syntax: (??- query1 query2 ...) or (??- query-name query1 query2 ...)
If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.
??<-
macro
(??<- & args)
Like ??-, but for ?<-. Returns a seq of tuples.
aggregatefn
macro
(aggregatefn & body)
bufferfn
macro
(bufferfn & body)
bufferiterfn
macro
(bufferiterfn & body)
cascalog-tap
(cascalog-tap source sink)
combine
(combine & [g & gens])
Merge the tuples from the subqueries together into a single
subquery. Doesn't ensure uniqueness of tuples.
compile-flow
(compile-flow & args)
Attaches output taps to some number of subqueries and creates a
Cascading flow. The flow can be executed with `.complete`, or
introspection can be done on the flow.
Syntax: (compile-flow sink1 query1 sink2 query2 ...)
or (compile-flow flow-name sink1 query1 sink2 query2)
If the first argument is a string, that will be used as the name
for the query and will show up in the JobTracker UI.
construct
(construct output-fields raw-predicates)
Parses predicates and output fields and returns a proper subquery.
defaggregatefn
macro
(defaggregatefn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.
defaggregateop
macro
(defaggregateop sym__898__auto__ & body__899__auto__)
defbufferfn
macro
(defbufferfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.
defbufferiterfn
macro
(defbufferiterfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.
defbufferiterop
macro
(defbufferiterop sym__898__auto__ & body__899__auto__)
defbufferop
macro
(defbufferop sym__898__auto__ & body__899__auto__)
deffilterfn
macro
(deffilterfn name doc-string? attr-map? [fn-args*] body)
Defines a filtering operation.
deffilterop
macro
(deffilterop sym__898__auto__ & body__899__auto__)
defmain
macro
(defmain name & forms)
Defines an AOT-compiled function with the supplied
`name`. Containing namespace must be marked for AOT compilation to
have any effect.
defmapcatfn
macro
(defmapcatfn name doc-string? attr-map? [fn-args*] body)
Defines a mapcat operation.
defmapcatop
macro
(defmapcatop sym__898__auto__ & body__899__auto__)
defmapfn
macro
(defmapfn name doc-string? attr-map? [fn-args*] body)
Defines a map operation.
defmapop
macro
(defmapop sym__898__auto__ & body__899__auto__)
defparallelagg
macro
(defparallelagg name doc-string? attr-map? & {:keys [init-var combine-var present-var]})
Binds an efficient aggregator to the supplied symbol. A parallel
aggregator processes each tuple through an initializer function,
then combines the results each tuple's initialization until one
result is achieved. `defparallelagg` accepts two keyword arguments:
:init-var -- A var bound to a fn that accepts raw tuples and returns
an intermediate result; #'one, for example.
:combine-var -- a var bound to a fn that both accepts and returns
intermediate results.
For example,
(defparallelagg sum
:init-var #'identity
:combine-var #'+)
Used as
(sum ?x :> ?y)
defprepfn
macro
(defprepfn name doc-string? attr-map? [fn-args*] body)
Defines a prepared operation.
div
(div f & rest)
Perform floating point division on the arguments. Use this instead
of / in Cascalog queries since / produces Ratio types which aren't
serializable by Hadoop.
explain
(explain outfile query)
(explain outfile sink-tap query)
Explains a query (by outputting a DOT file).
outfile - String location for DOT file output.
sink-tap - Sink tap for query. Shows on query explanation. Defaults to stdout if omitted.
query - Query to be explained.
Syntax: (explain outfile query) or (explain outfile sink query)
Ex: (explain "outfile.dot" (<- [?a ?b] ([[1 2]] ?a ?b)))
filterfn
macro
(filterfn & body)
get-out-fields
(get-out-fields _)
Get the fields of a generator.
hfs-seqfile
(hfs-seqfile path & opts)
Creates a tap on HDFS using sequence file format. Different
filesystems can be selected by using different prefixes for `path`.
Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/hfs-tap` for more keyword arguments.
See http://www.cascading.org/javadoc/cascading/tap/Hfs.html and
http://www.cascading.org/javadoc/cascading/scheme/SequenceFile.html
hfs-tap
(hfs-tap scheme path-or-file & {:keys [sinkmode sinkparts sink-template source-pattern templatefields], :or {templatefields Fields/ALL}})
Returns a Cascading Hfs tap with support for the supplied scheme,
opened up on the supplied path or file object. Supported keyword
options are:
`:sinkmode` - can be `:keep`, `:update` or `:replace`.
`:sinkparts` - used to constrain the segmentation of output files.
`:source-pattern` - Causes resulting tap to respond as a GlobHfs tap
when used as source.
`:sink-template` - Causes resulting tap to respond as a TemplateTap when
used as a sink.
`:templatefields` - When pattern is supplied via :sink-template,
this option allows a subset of output fields to be used in the
naming scheme.
See f.ex. the
http://docs.cascading.org/cascading/2.0/javadoc/cascading/scheme/local/TextDelimited.html
scheme.
hfs-textline
(hfs-textline path & opts)
Creates a tap on HDFS using textline format. Different filesystems
can be selected by using different prefixes for `path`. Supported
keyword options are:
`:outfields` - used to select the fields written to the tap
`:compression` - one of `:enable`, `:disable` or `:default`
See `cascalog.cascading.tap/hfs-tap` for more keyword arguments.
See http://www.cascading.org/javadoc/cascading/tap/Hfs.html and
http://www.cascading.org/javadoc/cascading/scheme/TextLine.html
lfs-seqfile
(lfs-seqfile path & opts)
Creates a tap that reads data off of the local filesystem in
sequence file format.
Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/lfs-tap` for more keyword arguments.
See http://www.cascading.org/javadoc/cascading/tap/Lfs.html and
http://www.cascading.org/javadoc/cascading/scheme/SequenceFile.html
lfs-tap
(lfs-tap scheme path-or-file & {:keys [sinkmode sinkparts sink-template source-pattern templatefields], :or {templatefields Fields/ALL}})
Returns a Cascading Lfs tap with support for the supplied scheme,
opened up on the supplied path or file object. Supported keyword
options are:
`:sinkmode` - can be `:keep`, `:update` or `:replace`.
`:sinkparts` - used to constrain the segmentation of output files.
`:source-pattern` - Causes resulting tap to respond as a GlobHfs tap
when used as source.
`:sink-template` - Causes resulting tap to respond as a TemplateTap
when used as a sink.
`:templatefields` - When pattern is supplied via :sink-template,
this option allows a subset of output fields to be used in the
naming scheme.
lfs-textline
(lfs-textline path & opts)
Creates a tap on the local filesystem using textline format.
Supports keyword option for `:outfields`. See
`cascalog.cascading.tap/lfs-tap` for more keyword arguments.
See http://www.cascading.org/javadoc/cascading/tap/Lfs.html and
http://www.cascading.org/javadoc/cascading/scheme/TextLine.html
mapcatfn
macro
(mapcatfn & body)
memory-source-tap
(memory-source-tap tuples)
(memory-source-tap fields-in tuples)
name-vars
(name-vars gen vars)
normalize-sink-connection
(normalize-sink-connection sink subquery)
num-out-fields
(num-out-fields _)
predmacro
macro
(predmacro & body)
A more general but more verbose way to create predicate macros.
Creates a function that takes in [invars outvars] and returns a
list of predicates. When making predicate macros this way, you must
create intermediate variables with gen-nullable-var(s). This is
because unlike the (<- [?a :> ?b] ...) way of doing pred macros,
Cascalog doesn't have a declaration for the inputs/outputs.
See https://github.com/nathanmarz/cascalog/wiki/Predicate-macros
predmacro*
(predmacro* fun)
Functional version of predmacro. See predmacro for details.
prepfn
macro
(prepfn args & body)
Defines a prepared operation. Pass in an argument vector of two
items and return either a function or a Map with two
keywords; :operate and :cleanup
select-fields
(select-fields gen fields)
Select fields of a named generator.
Example:
(<- [?a ?b ?sum]
(+ ?a ?b :> ?sum)
((select-fields generator ["?a" "?b"]) ?a ?b))
sequence-file
(sequence-file field-names)
stdout
(stdout)
Creates a tap that prints tuples sunk to it to standard
output. Useful for experimentation in the REPL.
text-line
(text-line)
(text-line field-names)
(text-line source-fields sink-fields)
(text-line source-fields sink-fields compression)
to-tail
(to-tail g & {:keys [fields]})
union
(union & gens)
Merge the tuples from the subqueries together into a single
subquery and ensure uniqueness of tuples.
with-job-conf
macro
(with-job-conf conf & body)
Modifies the job conf for queries executed within the form. Nested
with-job-conf calls will merge configuration maps together, with
innermost calls taking precedence on conflicting keys.
with-serializations
macro
(with-serializations serial-vec & forms)
Enables the supplied serializations for queries executed within the
form. Serializations should be provided as a vector of strings or
classes, like so:
(import 'org.apache.hadoop.io.serializer.JavaSerialization)
(with-serializations [JavaSerialization]
(?<- ...))
Serializations nest; nested calls to with-serializations will merge
and unique with serializations currently specified by other calls to
`with-serializations` or `with-job-conf`.