Questions tagged [flink-streaming]

Apache Flink is an open source platform for scalable batch and stream data processing. Flink supports batch and streaming analytics, in one system. Analytical programs can be written in concise and elegant APIs in Java and Scala.

0
votes
1answer
15 views

Flink 1.6 bucketing sink HDFS files stuck in .in-progress

I am writing Kafka data stream to bucketing sink in a HDFS path. Kafka gives out string data. Using FlinkKafkaConsumer010 to consume from Kafka-rw-r--r-- 3 ubuntu supergroup 4097694 2018-10-19 ...
0
votes
1answer
25 views

Watermarks in a RichParallelSourceFunction

I am implementing a SourceFunction, which reads Data from a Database. The job should be able to be resumed if stopped or crushed (i.e savepoints and checkpoints) with the data being processed exactly ...
0
votes
0answers
11 views

Flink - From streaming to batch - Start job from another

I would like to know, if there is a simple way to launch a Flink job from another Job..I have a streaming job, listening on the file system for new file, and I need to process thoses files (packaged)...
0
votes
0answers
21 views

The implementation of the provided ElasticsearchSinkFunction is not serializable(flink-connector-elasticsearch6_2.11)

"non-serializable" error occurs when I follow flink document to write data via flink streaming. I use flink1.6,Elastic-Search-6.4 and flink-connector-elasticsearch6.My code is like @Test...
0
votes
1answer
25 views

Apache Flink — daily aggregation of hourly aggregated data

I have a windowed hourly aggregated DataStream.DataStream<RawData> ds=.....SingleOutputStreamOperator<HourlyAggregated> hourly=ds.keyBy(HourlyCountersAggregation.KEY_SELECTOR)...
0
votes
1answer
19 views

Why we need StreamingFileSink when there is exists BucketingSink?

I found BucketingSink can do everything the StreamingFileSink can do, such as write events to local files, NAS or HDFS files(Original I thought the BucketingSink just can write events to HDFS and ...
0
votes
0answers
14 views

FLINK SQL : full join two tables based on time window

I have two stream tablesA tableid | data1 | ts------------------------1 | xxxx | 1234B tableid | data2 | ts------------------------2 | yyyy | 1234And I want to ...
0
votes
0answers
22 views

How can obtain processing time of a specific operator?

I'm going to work on effect of parallelism on different basic operators in Flink. I tried to change the number of parallelism of some specific operators. It has been done but I don't know how can I ...
2
votes
1answer
27 views

Flink Scala - Comparison method violates its general contract

I am writing a project in Flink that involves streaming a set of query points over batched data and performing a full sequential scan to find the nearest neighbors. What should be a simple sort ...
0
votes
0answers
17 views

apache beam Flink runner dont process all documents when apply TextIO watchForNewFiles

I am using apache beam 2.7.0i saw in the Capability Matrix (https://beam.apache.org/documentation/runners/capability-matrix/) and flink runner support Splittable DoFn but its like doesn't worki ...
1
vote
1answer
17 views

Flink Table/SQL API: modify rowtime attribute after session window aggregation

I want to use Session window aggregation and then run Tumble window aggregation on top of the produced result in Table API/Flink SQL.Is it possible to modify rowtime attribute after first session ...
0
votes
1answer
25 views

Can I use a custom partitioner with group by?

Let's say that I know that my dataset is unbalanced and I know the distribution of the keys. I'd like leverage this to write a custom partitioner to get the most out of the operator instances.I know ...
3
votes
1answer
139 views

JpaRepository not autowiring in custom RichSinkFunction

I have created a custom Flink RichSinkFunction and attempted to autowire a JpaRepository within this custom class but I am constantly getting a NullPointerException. If I autowire it in the ...
0
votes
1answer
37 views

Apache Flink: Kafka connector in Python streaming API, “Cannot load user class”

I am trying out Flink's new Python streaming API and attempting to run my script with http://stackoverflow.com/flink-1.6.1/bin/pyflink-stream.sh examples/read_from_kafka.py. The python script is fairly straightforward, I am ...
0
votes
1answer
26 views

Apache Flink Kubernetes Job Arguments

I'm trying to setup a cluster (Apache Flink 1.6.1) with Kubernetes and get following error when I run a job on it:2018-10-09 14:29:43.212 [main] INFO org.apache.flink.runtime.entrypoint....
0
votes
1answer
18 views

Flink - event-time sliding window with missing data in window due to time gaps

Suppose I have a stream of stock market trading events, like this:technical1, ALXN, 1/1/2016technical1, CELG, 1/1/2016technical2, ALXN, 1/2/2016technical2, CELG, 1/2/2016. . . technicalN, ALXN, ...
0
votes
0answers
9 views

how to handle out-of-order event or late event in non-window pipeline in apache beam

My pipeline usage is more like a rolling updater. It starts from Kafka, grab the messages, and keep updating an updater, and finally generate 0 or multiple decision event(s) from the updater. It is ...
0
votes
1answer
38 views

Deploy stream processing topology on runtime?

H all,I have a requirement where in I need to re-ingest some of my older data. We have a multi staged pipeline , the source of which is a Kafka topic. Once a record is fed into that, it runs through ...
1
vote
1answer
31 views

Apache Flink: Stream Join Window is not triggered

I'm trying to join two streams in apache flink to get some results.The current state of my project is, that I am fetching twitter data and map it into a 2-tuple, where the language of the user and ...
0
votes
1answer
39 views

Using Flink to get Counts Within a Keyed Window

I'm using Flink via the Scala interface to do some data processing. I have some user data that comes in tuples:(user1, "titanic")(user1, "titanic")(user1, "batman")(user2, "star wars")(user2, "...
0
votes
0answers
62 views

Flink Connection refused: localhost/127.0.0.1:8081

I am getting the following error org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8081while trying to stream ...
0
votes
0answers
18 views

FlinkKafkaConsumer not detecting new topics

I have a FlinkKafkaConsumer11 with regex. It listens to the existing topics satisfying the Regex but newly created topics are not detected. After restarting the job, it starts detecting those topics ...
0
votes
1answer
38 views

Flink: Could not find a suitable table factory for 'org.apache.flink.table.factories.DeserializationSchemaFactory' in the classpath

I am using flink's table api, I receive data from kafka, then register it as a table, then I use sql statement to process, and finally convert the result back to a stream, write to a directory, the ...
0
votes
0answers
17 views

Flink - Process datastream in a bounded context (like Dataset)

I need to join some files to perform some processing..The dataset API is a perfect way to achieve this, I am able to do it when I read file in batch (csv) However in the prod environment, I will ...
0
votes
1answer
20 views

Flink: the result of SocketWindowCount example isn't what I expected

I am new to flink. I follow the quickstart on the flink website and deploy flink on a single machine. After I do "http://stackoverflow.com/bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000" and enter ...
0
votes
1answer
59 views

Processing time windows doesn't work on finite data sources in Apache Flink

I'm trying to apply a very simple window function to a finite data stream in Apache Flink (locally, no cluster). Here's the example:val env=StreamExecutionEnvironment.getExecutionEnvironmentenv...
0
votes
2answers
28 views

Apache Flink KeyedStream after window operator behavior clarification

I'm requesting clarification on exactly how Apache Flink (1.6.0) handles events from KeyedStreams after the events have been sent through a window and some operator (such as reduce() or process()) has ...
0
votes
2answers
27 views

Flink DataStream - how to start a source from an input element?

Say I have a Flink SourceFunction<String> called RequestsSource. On each request coming in from that source, I would like to subscribe to an external data source (for the purposes of an ...
0
votes
0answers
38 views

Flink Savepoints akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager_1]

While running Flink(1.6.0) datastream job with RocksDB as backend storage and HDFS as storage for checkpoints, I've encounter akka.pattern.AskTimeoutException while trying to do savepoint via cli....
0
votes
0answers
21 views

Flink window operator schema evolution - savepoint deserialization fail

I run simple streaming job where I compute hourly impressions for campaigns: .keyBy(imp=> imp.campaign_id) .window(TumblingEventTimeWindows.of(...)) .aggregate(new ...

153050per page
angop.ao, elkhabar.com, noa.al, afghanpaper.com, bbc.com, time.com, cdc.gov, nih.gov, xnxx.com, github.com,