Overview of Time Series Databases in Systems Programming

undefined
By: Kyriakou Stefanos (skyria12 AT cs.ucy.ac.cy)
Leontiou Panayiotis (pleont02 AT cs.ucy.ac.cy)
Tymvios Stelios (stymvi01 AT cs.ucy.ac.cy)
1
EPL421: Systems Programming
EPL421: Systems Programming
.
Time Series Databases
Time Series Databases
https://www.cs.ucy.ac.cy/courses/EPL421
Main presentation topics
Relational vs Tagset data model
What is a time series database and some of the current available options
InfluxDB compared to other time series databases
Setup of Couchbase, anyplace and influxDB
Api Usage
Security issues
InfluxDB Tutorial
2
https://www.cs.ucy.ac.cy/courses/EPL421
Relational data model
The database management system is responsible for describing the data structures,
for storing data and retrieval procedures are responsible for answering queries
Most relational databases use the SQL data definition and query language
Each time-series measurement is recorded in its own row, with a time field followed by
any number of other fields
Fields support various and more complex data types
Create indexes on any field or on multiple fields
Any of these fields can be used as a foreign key to secondary tables
3
https://www.cs.ucy.ac.cy/courses/EPL421
4
https://www.cs.ucy.ac.cy/courses/EPL421
Advantages of Relation data model
A narrow or wide table, based on how much data and metadata we want to record per
reading
Many indexes to speed up queries or few indexes to reduce disk usage
Denormalized metadata within the measurement row, or normalized metadata that
lives in a separate table
A rigid schema that validates input types or a schemaless JSON blob to increase
iteration speed
Check constraints that validate inputs, for instance checking for uniqueness or non-
null values
5
https://www.cs.ucy.ac.cy/courses/EPL421
Disadvantages of Relation data model
Need to select a schema and explicitly
decide whether or not to use indexes
6
https://www.cs.ucy.ac.cy/courses/EPL421
Tagset data model
Each measurement has a timestamp, an associated set of tags (tagset) and a set of
fields (fieldset)
The fieldset represents the actual measurement data values.
The tagset represents the metadata to describe the measurements
Field data types are limited to floats, ints, strings, and booleans, and cannot be
changed without rewriting the data
Tagset values are always represented as strings and cannot be updated
Tagset values are indexed while fieldset values are not
7
https://www.cs.ucy.ac.cy/courses/EPL421
8
https://www.cs.ucy.ac.cy/courses/EPL421
Advantages of tagset data model
Very easy to get started
No need to create schemas or indexes
9
https://www.cs.ucy.ac.cy/courses/EPL421
Disadvantages of Tagset data model
Rigid and limited, with no option to create
any additional indexes
The underlying schema is auto-generated
based on the input data, which may differ
from the desired schema
10
https://www.cs.ucy.ac.cy/courses/EPL421
What is a Time series database (TSDB)?
A software system that is optimized for
storing and serving time series through
associated pairs of time and values.
11
https://www.cs.ucy.ac.cy/courses/EPL421
Why time series DBs are exploding in
popularity?
Enterprises want to be able to query, analyze
and create reports based on streaming data in
real-time, instead of batch mode.
Over past years, time series databases have
exploded in popularity, according to database
engines data
12
https://www.cs.ucy.ac.cy/courses/EPL421
No database type has grown faster in
popularity than time series DBs
13
https://www.cs.ucy.ac.cy/courses/EPL421
According to Timescale CEO Ajay Kulkarni …
Time-series datasets track changes to the overall system as
INSERTS, not UPDATES.
What makes time-series data so powerful is the fact that they record
each and every change to the system as a new different row.
Time-series databases introduce the ability to analyze how
something changed in the past. In addition they can be monitored to
see how something is changing in the present, or even to make
predicting about how it may change in the future
14
https://www.cs.ucy.ac.cy/courses/EPL421
Choosing a time series DB
Check what they offer and if those fit your needs
Should be OpenSource
No data size limitations
Free to use DB
Fast and efficient
15
https://www.cs.ucy.ac.cy/courses/EPL421
Some of the available options
InfluxDB
Graphite
TimescaleDB
OpenTSDB
VictoriaMetrics
16
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB
Date of birth: 2013
Ranked in the first place in the last 3 years (2017-2019)
Completely open-source time series database
Working on all current operating systems
Supports a very large set of programming languages
Optimized for heavy writing load
Works amazingly well with concurrency
Schema-Free database
17
https://www.cs.ucy.ac.cy/courses/EPL421
Why choose InfluxDB?
Super easy to install, configure and launch
As a NoSQL-like database, you don’t have to setup your database
InfluxData provides a visualization tool
Uses Flux, a new processing language, which is becoming a new tech trend, or an SQL-like language (it can
also be used with HTTP requests)
Gives more power to the user but at the same time reduces the power of the database
Stores data in LSM trees, which are better suited for storing time series data comparing to general-purpose
storage provided by Postgresql
Drawbacks:
1. No same-time insert
2. Poor performance for deletion with predicates
18
https://www.cs.ucy.ac.cy/courses/EPL421
Graphite
Very widely used time series database system
Powerful monitoring tool that stores numeric time series data
Can display the stored data on demand via its Graphite-web interface at a fair
speed
Most of the time used as a system, network and application performance metric
store
Big companies such as Booking.com, Reddit and GitHub use it on a daily basis
to be able to easily detect outage on their architecture
19
https://www.cs.ucy.ac.cy/courses/EPL421
Why choose Graphite?
Built to deal with numeric data
Graphite Web is an interface for developers to monitor their
application
Connects with a lot of tools natively
Makes it easy for developers to connect with their existing
infrastructure
20
https://www.cs.ucy.ac.cy/courses/EPL421
21
https://www.cs.ucy.ac.cy/courses/EPL421
22
https://www.cs.ucy.ac.cy/courses/EPL421
23
https://www.cs.ucy.ac.cy/courses/EPL421
TimescaleDB
Open-sourced
Based on SQL premises
Very large set of supported programming languages
Directly tied with PostgresSQL
Offers a unique set of time series related operations (like fast
ingest)
24
https://www.cs.ucy.ac.cy/courses/EPL421
Why choose TimescaleDB?
Supports the SQL language natively
No need to learn a new language
Big companies rely on SQL-constraint systems in order to ensure system
reliability and accessibility
Drawbacks:
1. Quickly reaches disk bandwidth limit, which can be lifted by using more
expensive disks with higher read / write bandwidth such as high-end SSDs
2. Requires much more storage space comparing to VictoriaMetrics and
InfluxDB for storing the same amount of data points
25
https://www.cs.ucy.ac.cy/courses/EPL421
https://www.cs.ucy.ac.cy/courses/EPL646
26
https://www.cs.ucy.ac.cy/courses/EPL421
27
https://www.cs.ucy.ac.cy/courses/EPL421
OpenTSDB
Able to store hundreds of billions of data rows over
distributed instances of TSD servers
Schema free database built on Apache HBase
HBase is a non-relational management system
written to handle big tables storage in an elegant
and efficient way
28
https://www.cs.ucy.ac.cy/courses/EPL421
Why choose OpenTSDB?
Can handle several millions writes per second
Better performance than InfluxDB, when dealing
with more than one million writes per second.
OpenTSDB integrates with Cassandra, BigTable,
CollectD, StatsD, Chef and even Puppet for
deployment management
29
https://www.cs.ucy.ac.cy/courses/EPL421
30
https://www.cs.ucy.ac.cy/courses/EPL421
31
https://www.cs.ucy.ac.cy/courses/EPL421
32
https://www.cs.ucy.ac.cy/courses/EPL421
VictoriaMetrics
Supports native PromQL (doesn’t support SQL)
Supports wide range of retention periods starting from 1 month
Compresses on-disk data better than competitors (according to their website),
which means it can handle longer retentions without downsampling
Excels on heavy queries over thousands of metrics with millions of data points
Open Source under Apache2 license
33
https://www.cs.ucy.ac.cy/courses/EPL421
Why choose VictoriaMetrics?
Requires fewer hardware resources (RAM, CPU, storage) which allows for saving hardware costs
Outperforms InfluxDB and TimescaleDB on data ingestion
VictoriaMetrics has the best optimization for disk IO bandwidth usage, compared to InfluxDB and
TimescaleDB.
VictoriaMetrics provides better vertical scalability for both data ingestion and querying, compared
to InfluxDB and TimescaleDB
Stores data in LSM trees, which are better suited for storing time series data comparing to
general-purpose storage provided by Postgresql
Drawbacks: It is a relatively new database, which was written from scratch and may contain
unpolished code
34
https://www.cs.ucy.ac.cy/courses/EPL421
35
https://www.cs.ucy.ac.cy/courses/EPL421
36
https://www.cs.ucy.ac.cy/courses/EPL421
RAM usage for various cardinalities
37
https://www.cs.ucy.ac.cy/courses/EPL421
Our mission
Install and configure couchbase
Install and configure anyplace
Install and configure influxDB
Create API endpoints to connect anyplace with
influxDB
38
https://www.cs.ucy.ac.cy/courses/EPL421
What is Anyplace?
A free and open Indoor Navigation Service
with excellent accuracy
A first-of-a-kind indoor information service
offering GPS-less localization, navigation
and search inside buildings using ordinary
smartphones
39
https://www.cs.ucy.ac.cy/courses/EPL421
Awards
2018 - Best Demo Award 19th IEEE International Conference on Mobile Data Management
June 26 - June 28, 2018, Aalborg, Denmark.
2017 - Honorable Mention Award 18th IEEE International Conference on Mobile Data
Management May 29 - June 1, 2017, KAIST, Daejeon, South Korea.
2014 - 1st place at Evaluation of RF-based Indoor Localization Solutions for the Future
Internet (EVARILOS Open Challenge), European Union, Berlin, Germany
2014 - 2nd place at Microsoft Research Indoor Localization Competition at IEEE/ACM IPSN
2014, Berlin, Germany.
2012 - Best Demo Award at IEEE Mobile Data Management Conference, Bangalore, India.
40
https://www.cs.ucy.ac.cy/courses/EPL421
Anyplace architecture
41
https://www.cs.ucy.ac.cy/courses/EPL421
Anyplace architecture
42
https://www.cs.ucy.ac.cy/courses/EPL421
Scala
Object-oriented and functional programming high-level
language
Scala's static types help avoid bugs in complex
applications
Its JVM and JavaScript runtimes gives the ability to users
to build high-performance systems and gives access to
huge ecosystems of libraries
43
https://www.cs.ucy.ac.cy/courses/EPL421
Play framework
Lightweight, stateless, web-friendly architecture
Uses Akka and Akka Streams under the covers to provide predictable
and minimal resource consumption (CPU, memory, threads)
Akka and Akka Streams abstract away from the imperative nature of
how the data is inputted into the application giving us a declarative
way of describing, handling it and hiding details that we don’t care
about. Streaming helps you ingest, process, analyze, and store data in
a quick and responsive manner.
44
https://www.cs.ucy.ac.cy/courses/EPL421
What is couchbase?
45
An open-source, distributed multi-model NoSQL document-
oriented database software package that is optimized for
interactive applications.
Designed to provide easy-to-scale key-value or JSON document
access with low latency and high sustained throughput.
Designed to be clustered from a single machine to very large-
scale deployments using many machines.
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of Couchbase
I
n
s
t
a
l
l
 
C
o
u
c
h
b
a
s
e
:
curl -O https://packages.couchbase.com/releases/couchbase-
release/couchbase-release-1.0-6-amd64.deb
sudo dpkg -i ./couchbase-release-1.0-6-amd64.deb
sudo apt-get update
sudo apt-get install couchbase-server-community
Please note that you have to update your firewall configuration to allow connections to the following ports: 4369, 8091 to 8094, 9100 to 9105, 9998, 9999,
11209 to 11211, 11214, 11215, 18091 to 18093, and from 21100 to 21299.
46
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of Couchbase
S
t
a
r
t
 
c
o
u
c
h
b
a
s
e
:
sudo service couchbase-server start
S
t
o
p
 
c
o
u
c
h
b
a
s
e
:
sudo service couchbase-server stop
C
h
e
c
k
 
s
t
a
t
u
s
:
sudo service couchbase-server status
47
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of Couchbase
C
o
n
f
i
g
u
r
a
t
i
o
n
:
 
V
i
s
i
t
 
t
h
e
 
b
e
l
o
w
a
d
d
r
e
s
s
 
t
o
 
c
o
n
f
i
g
u
r
e
 
C
o
u
c
h
b
a
s
e
http://localhost:8091/
*Make sure that port 8091 is open
48
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
I
n
s
t
a
l
l
:
wget
https://anyplace.cs.ucy.ac.cy/downloads/any
place_v3.zip
unzip anyplace_v3.zip
49
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
C
o
n
f
i
g
u
r
a
t
i
o
n
:
 
E
d
i
t
 
c
o
n
f
i
g
u
r
a
t
i
o
n
 
f
i
l
e
 
u
n
d
e
r
 
a
n
y
p
l
a
c
e
 
f
o
l
d
e
r
vim conf/application.conf
E
d
i
t
 
t
h
e
 
f
o
l
l
o
w
i
n
g
 
f
i
e
l
d
s
 
a
c
c
o
r
d
i
n
g
l
y
:
 
(
a
l
l
 
m
u
s
t
 
b
e
 
i
n
 
d
o
u
b
l
e
 
q
u
o
t
e
s
 
e
x
c
e
p
t
 
p
o
r
t
 
n
u
m
b
e
r
s
)
application.secret=< This is a Play Framework parameter >
couchbase.hostname=< Default is "http://localhost">
couchbase.port=< Default is 8091>
couchbase.bucket=< Name of couchbase bucket, must be the same with username >
couchbase.username=< Username for couchbase database >
couchbase.password=< Password for couchbase database >
influxdb.hostname=< Default is "http://localhost">
influxdb.port=< Default is 8086 >
influxdb.database=< Name of influxDB database >
50
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
C
r
e
a
t
e
 
c
o
u
c
h
b
a
s
e
 
v
i
e
w
s
:
git clone https://github.com/dmsl/anyplace.git
cd anyplace/server/anyplace_views
chmod +x create-views.sh
vim create-views.sh
A
d
d
 
(
U
S
E
R
N
A
M
E
,
 
P
A
S
S
W
O
R
D
 
a
n
d
 
B
U
C
K
E
T
)
 
a
c
c
o
r
d
i
n
g
 
t
o
 
y
o
u
r
 
c
o
u
c
h
b
a
s
e
 
a
c
c
o
u
n
t
,
 
s
a
v
e
a
n
d
 
c
l
o
s
e
 
t
h
e
 
s
c
r
i
p
t
R
u
n
 
t
h
e
 
s
c
r
i
p
t
 
t
o
 
c
r
e
a
t
e
 
t
h
e
 
v
i
e
w
s
:
 
.
/
c
r
e
a
t
e
-
v
i
e
w
s
.
s
h
51
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
T
o
 
l
a
u
n
c
h
 
a
n
y
p
l
a
c
e
:
cd anyplace_v3/bin
chmod +x anyplace_v3
./anyplace_v3 (alternatively use: $ nohup ./anyplace_v3 >
anyplace.log 2>&1 )
T
o
 
s
t
o
p
 
a
n
y
p
l
a
c
e
:
 
P
r
e
s
s
 
C
t
r
l
-
C
 
o
r
 
k
i
l
l
 
t
h
e
 
r
e
s
p
e
c
t
i
v
e
 
p
r
o
c
e
s
s
52
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
T
o
 
t
e
s
t
 
a
n
y
p
l
a
c
e
 
v
i
s
i
t
 
t
h
e
 
f
o
l
l
o
w
i
n
g
 
U
R
L
s
:
http://localhost:9000/viewer
http://localhost:9000/architect
http://localhost:9000/developers
53
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of anyplace
Install a free certificate from 
https://letsencrypt.org/
 on
your Anyplace Server to obtain a secure https connection.
(Optional) Install a free load balancer from 
HAProxy
 to
scale your installation to multiple Anplace servers. In case
of Anyplace cluster configuration, please install the
certificate on the load balancer.
54
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of influxDB
D
o
w
n
l
o
a
d
 
i
n
f
l
u
x
D
B
 
o
n
 
u
b
u
n
t
u
:
wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
source /etc/lsb-release
echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME}
stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
I
n
s
t
a
l
l
 
i
n
f
l
u
x
D
B
:
sudo apt-get update && sudo apt-get install influxdb
55
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of influxDB
S
t
a
r
t
 
i
n
f
l
u
x
D
B
:
s
u
d
o
 
s
e
r
v
i
c
e
 
i
n
f
l
u
x
d
b
 
s
t
a
r
t
S
t
o
p
 
i
n
f
l
u
x
D
B
:
s
u
d
o
 
s
e
r
v
i
c
e
 
i
n
f
l
u
x
d
b
 
s
t
o
p
C
h
e
c
k
 
s
t
a
t
u
s
 
o
f
 
i
n
f
l
u
x
D
B
:
s
u
d
o
 
s
e
r
v
i
c
e
 
i
n
f
l
u
x
d
b
 
s
t
a
t
u
s
56
https://www.cs.ucy.ac.cy/courses/EPL421
Installation & configuration of influxDB
C
o
n
f
i
g
u
r
e
 
i
n
f
l
u
x
D
B
:
v
i
m
 
/
e
t
c
/
i
n
f
l
u
x
d
b
/
i
n
f
l
u
x
d
b
.
c
o
n
f
T
o
 
v
i
e
w
 
t
h
e
 
d
e
f
a
u
l
t
 
c
o
n
f
i
g
u
r
a
t
i
o
n
 
s
e
t
t
i
n
g
s
:
i
n
f
l
u
x
d
 
c
o
n
f
i
g
T
o
 
u
s
e
 
y
o
u
r
 
o
w
n
 
c
o
n
f
i
g
u
r
a
t
i
o
n
 
f
i
l
e
 
e
x
e
c
u
t
e
 
i
n
f
l
u
x
D
B
 
w
i
t
h
:
i
n
f
l
u
x
d
 
-
c
o
n
f
i
g
 
/
e
t
c
/
i
n
f
l
u
x
d
b
/
i
n
f
l
u
x
d
b
.
c
o
n
f
TCP port 8086 is available for client-server communication using the InfluxDB API
TCP port 8088 is available for the RPC service to perform back up and restore operations
Both ports should be open to use influxDB
57
https://www.cs.ucy.ac.cy/courses/EPL421
API route mapping
# Anyplace API - InfluxDB
POST        /anyplace/influxdb/insert
controllers.AnyplaceInfluxdb.insertInfluxdb()
POST        /anyplace/influxdb/query_2_points
controllers.AnyplaceInfluxdb.query2PointsInfluxdb()
* located in anyplace/server/conf/routes
58
https://www.cs.ucy.ac.cy/courses/EPL421
How it works?
59
https://www.cs.ucy.ac.cy/courses/EPL421
How it works?
60
https://www.cs.ucy.ac.cy/courses/EPL421
https://www.cs.ucy.ac.cy/courses/EPL421
GeoHash
Spatial Hashing
61
Geohash Lookups
 
62
https://www.cs.ucy.ac.cy/courses/EPL421
https://www.cs.ucy.ac.cy/courses/EPL421
In practice
curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"deviceID":"5","point1":{"latitude":"37.350101","longitude":"37.3501"},"point2":{"latitude":"37.35012","longitude":"37.3502"}, "beginTime":1, "endTime":10}' \
  
http://10.16.30.47:9000/anyplace/influxdb/query_2_points
{
   "5":[
      {
         "deviceID":"5","point":{ "latitude":"37.350101","longitude":"37.3501"},
         "timestamp":1,"ifxtime":"2019-11-23T20:43:31.81891762Z"
      },
      {
         "deviceID":"5","point":{ "latitude":"37.350102","longitude":"37.3502"},
         "timestamp":2,"ifxtime":"2019-11-23T20:43:32.056097637Z"
      },
      {
         “deviceID":"5","point":{“latitude":"37.35011", "longitude":"37.3501"},
         “timestamp":10, "ifxtime":"2019-11-23T20:43:32.287098102Z"
      }
   ]
}
* We use strings as we have encountered issues with queries due to floating point representation issues
0.13 != “0.13”.asFloat()
63
https://www.cs.ucy.ac.cy/courses/EPL421
In practice
curl --header "Content-Type: application/json" \
--request POST \
--data '{"deviceID":"5","point":{"latitude":"37.350101","longitude":"37.3501"},"distance":"16", "beginTime":1, "endTime":20}' \
http://10.16.30.47:9000/anyplace/influxdb/query_1_point_distance
{
   "5":[
      { “deviceID":"5", "point":{ "latitude":"37.350101","longitude":"37.3501"},
         "timestamp":1,"ifxtime":"2019-11-23T20:43:31.81891762Z"},
      { "deviceID":"5","point":{ "latitude":"37.350102","longitude":"37.3502"},
         "timestamp":2,"ifxtime":"2019-11-23T20:43:32.056097637Z"},
      { “deviceID":"5", "point":{  "latitude":"37.35011", “longitude":"37.3501" },
         “timestamp":10, "ifxtime":"2019-11-23T20:43:32.287098102Z"}
   ]
}
* We use strings as we have encountered issues with queries due to floating point representation issues
0.13 != “0.13”.asFloat()
64
https://www.cs.ucy.ac.cy/courses/EPL421
In practice
curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"deviceID":"myDevice", "point":{"latitude":"22","longitude":"17"}, "timestamp":900000333}' \
http://10.16.30.47:9000/anyplace/influxdb/insert
$ influx -database anyplace
Connected to http://localhost:8086 version 1.7.8
InfluxDB shell version: 1.7.8
> select * from location where deviceID =~ /myDevice/
name: location
time                deviceID      geohash  latitude longitude timestamp
----                --------      -------  -------- --------- ---------
1575021195168509151 myDevice      s7uj4ugp 22       17        900000320
* We use strings as we have encountered issues with queries due to floating point representation issues
0.13 != “0.13”.asFloat()
65
Security issue
Use API key of user concatenated with a
name for the device, which will be stored in
couchbase along with the deviceID
The deviceID will be unique and will be
created automatically within the anyplace
system
66
https://www.cs.ucy.ac.cy/courses/EPL421
Future work
A graphical representation of the data could
be created to indicate the path followed by a
device, over a specific time period
A custom retention policy could be used for
influxDB so that the old data will be dropped
after passing a specified threshold/limit
67
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (1)
O
p
e
n
 
C
o
m
m
a
n
d
 
L
i
n
e
 
I
n
t
e
r
f
a
c
e
 
(
C
L
I
)
:
 
i
n
f
l
u
x
 
-
p
r
e
c
i
s
i
o
n
 
r
f
c
3
3
3
9
C
r
e
a
t
e
 
a
 
d
a
t
a
b
a
s
e
:
 
C
R
E
A
T
E
 
D
A
T
A
B
A
S
E
 
m
y
d
b
D
i
s
p
l
a
y
 
a
l
l
 
e
x
i
s
t
i
n
g
 
d
a
t
a
b
a
s
e
s
:
 
S
H
O
W
 
D
A
T
A
B
A
S
E
S
S
e
t
 
t
h
e
 
d
a
t
a
b
a
s
e
 
f
o
r
 
a
l
l
 
f
u
t
u
r
e
 
r
e
q
u
e
s
t
s
:
 
U
S
E
 
m
y
d
b
D
r
o
p
 
d
a
t
a
b
a
s
e
:
 
D
R
O
P
 
D
A
T
A
B
A
S
E
 
m
y
d
b
The -precision argument specifies the format/precision of any returned timestamps. In the example above, rfc3339 tells InfluxDB to return
timestamps in RFC3339 format (YYYY-MM-DDTHH:MM:SS.nnnnnnnnnZ)
68
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (2)
I
n
s
e
r
t
 
d
a
t
a
 
i
n
t
o
 
t
h
e
 
d
a
t
a
b
a
s
e
:
I
N
S
E
R
T
 
<
m
e
a
s
u
r
e
m
e
n
t
>
,
<
t
a
g
>
=
<
v
a
l
u
e
-
s
t
r
i
n
g
>
[
,
<
t
a
g
>
=
<
v
a
l
u
e
-
s
t
r
i
n
g
>
]
<
s
p
a
c
e
>
<
f
i
e
l
d
>
=
<
v
a
l
u
e
>
[
,
<
f
i
e
l
d
>
=
<
v
a
l
u
e
>
]
E
x
a
m
p
l
e
s
:
I
N
S
E
R
T
 
e
p
l
4
2
1
,
s
t
u
d
e
n
t
=
D
o
n
a
l
d
 
g
r
a
d
e
=
9
I
N
S
E
R
T
 
e
p
l
4
2
1
,
s
t
u
d
e
n
t
=
T
o
m
 
g
r
a
d
e
=
8
I
N
S
E
R
T
 
e
p
l
4
2
1
,
s
t
u
d
e
n
t
=
J
e
r
r
y
 
g
r
a
d
e
=
1
0
S
e
l
e
c
t
 
t
h
e
 
d
a
t
a
 
w
e
 
j
u
s
t
 
w
r
o
t
e
:
S
E
L
E
C
T
 
<
t
a
g
>
[
,
<
t
a
g
>
]
,
 
<
f
i
e
l
d
>
[
,
<
f
i
e
l
d
>
]
 
F
R
O
M
 
<
m
e
a
s
u
r
e
m
e
n
t
>
E
x
a
m
p
l
e
s
:
S
E
L
E
C
T
 
"
s
t
u
d
e
n
t
"
,
 
"
g
r
a
d
e
"
 
F
R
O
M
 
"
e
p
l
4
2
1
"
S
E
L
E
C
T
 
*
 
F
R
O
M
 
e
p
l
4
2
1
"
 
L
I
M
I
T
 
2
69
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (3)
We could use a curl command in a bash script to
insert data into influxDB. The inserts could be
written in a file (up to 5000 inserts), where each
line should have the following structure:
<measurement>,<tag>=<value-string>[,<tag>=<value-
string>…]<space><field>=<value>[,<field>=<value>…]
70
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (4)
The below bash script was used to insert data into influxDB (0.5 billion inserts). In this case the curl
command is used to insert 500 lines of data from the insertTmp.txt file into the database ‘mydb’.
#!/bin/bash
influx -execute 'CREATE DATABASE anyplace'
for j in `seq 1 1000000`
do
    
 
x=$RANDOM
 
y=$RANDOM
 
timestamp=$j
 
geohash="$(./geohash $x $y 8)"
 
for i in `seq 1 500`
 
do
 
  
echo "location,deviceID=$i timestamp=$timestamp,x=$x,y=$y"
 
done > insertTmp.txt
 
curl -i -XPOST 'http://localhost:8086/write?db=mydb' --data-binary @insertTmp.txt &> /dev/null
 
echo "Done loop $j"
done
71
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (5)
When a record is inserted into influxDB it
takes as time the current time in nano
seconds. However, this time can be specified
by the user.
INSERT <measurement>,<tag>=<value>[,<tag>=<value>…]
<field>=<value>[,<field>=<value>…] <time-in-nanoseconds>
72
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (6)
M
o
r
e
 
a
d
v
a
n
c
e
d
 
s
e
l
e
c
t
 
q
u
e
r
y
:
S
E
L
E
C
T
 
*
 
F
R
O
M
 
"
e
p
l
4
2
1
"
 
W
H
E
R
E
 
"
g
r
a
d
e
"
 
>
 
9
S
E
L
E
C
T
 
*
 
F
R
O
M
 
"
e
p
l
4
2
1
"
 
W
H
E
R
E
 
(
g
r
a
d
e
"
 
<
 
9
 
a
n
d
 
"
g
r
a
d
e
"
 
>
 
6
)
S
e
l
e
c
t
 
q
u
e
r
y
 
w
i
t
h
 
c
u
r
l
 
c
o
m
m
a
n
d
 
u
s
i
n
g
 
G
E
T
:
c
u
r
l
 
-
G
 
'
h
t
t
p
:
/
/
l
o
c
a
l
h
o
s
t
:
8
0
8
6
/
q
u
e
r
y
?
p
r
e
t
t
y
=
t
r
u
e
'
 
-
-
d
a
t
a
-
u
r
l
e
n
c
o
d
e
"
d
b
=
m
y
d
b
"
 
-
-
d
a
t
a
-
u
r
l
e
n
c
o
d
e
 
"
q
=
S
E
L
E
C
T
 
\
"
u
s
e
r
\
"
 
F
R
O
M
 
\
"
e
p
l
4
2
1
\
"
W
H
E
R
E
 
\
g
r
a
d
e
\
"
=
'
1
0
'
"
A
p
p
e
n
d
i
n
g
 
p
r
e
t
t
y
=
t
r
u
e
 
t
o
 
t
h
e
 
U
R
L
 
e
n
a
b
l
e
s
 
p
r
e
t
t
y
-
p
r
i
n
t
e
d
 
J
S
O
N
 
o
u
t
p
u
t
.
 
W
h
i
l
e
 
t
h
i
s
 
i
s
 
u
s
e
f
u
l
 
f
o
r
d
e
b
u
g
g
i
n
g
 
o
r
 
w
h
e
n
 
q
u
e
r
y
i
n
g
 
d
i
r
e
c
t
l
y
 
w
i
t
h
 
t
o
o
l
s
 
l
i
k
e
 
c
u
r
l
,
 
i
t
 
i
s
 
n
o
t
 
r
e
c
o
m
m
e
n
d
e
d
 
f
o
r
 
p
r
o
d
u
c
t
i
o
n
u
s
e
 
a
s
 
i
t
 
c
o
n
s
u
m
e
s
 
u
n
n
e
c
e
s
s
a
r
y
 
n
e
t
w
o
r
k
 
b
a
n
d
w
i
d
t
h
.
73
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (7)
This script was used to find the records for a device based on deviceID
and two points (x1, y1) and (x2, y2)
#!/bin/bash
if [ ! $# -eq 5 ]; then
 
echo "Error! The number of arguments is not correct"
 
echo "usage: deviceID lat1 lat2 long1 long2"
 
exit 1
fi
q1="select * from location where (\”deviceID\" = '$2') and ((x+180>=$5+180) and (x+180 <$6+180) and (y+90>=$3+90) and (y+90<=$4+90))"
q2="select * from location where (\"deviceID\" = '$5')  and ((x+180<=$5+180) or (x+180 >$6+180) and (y+90>=$3+90) and (y+90<=$4+90))"
if [[ $1 -gt $2 ]]; then q="$q2"; else q="$q1"; fi
curl -G 'http://localhost:8086/query?pretty=true' --data-urlencode "db=anyplace" --data-urlencode "q=$q"
74
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (8)
D
r
o
p
 
s
e
r
i
e
s
:
 
T
h
e
 
D
R
O
P
 
S
E
R
I
E
S
 
q
u
e
r
y
 
d
e
l
e
t
e
s
 
a
l
l
 
p
o
i
n
t
s
 
f
r
o
m
a
 
s
e
r
i
e
s
 
i
n
 
a
 
d
a
t
a
b
a
s
e
,
 
a
n
d
 
i
t
 
d
r
o
p
s
 
t
h
e
 
s
e
r
i
e
s
 
f
r
o
m
 
t
h
e
 
i
n
d
e
x
.
D
R
O
P
 
S
E
R
I
E
S
 
F
R
O
M
 
<
m
e
a
s
u
r
e
m
e
n
t
_
n
a
m
e
[
,
m
e
a
s
u
r
e
m
e
n
t
_
n
a
m
e
]
>
W
H
E
R
E
 
<
t
a
g
_
k
e
y
>
=
<
t
a
g
_
v
a
l
u
e
>
E
x
a
m
p
l
e
:
 
D
R
O
P
 
S
E
R
I
E
S
 
F
R
O
M
 
"
e
p
l
4
2
1
"
 
W
H
E
R
E
 
"
u
s
e
r
"
 
=
 
'
T
o
m
'
*
A successful DROP SERIES query returns an empty result.
75
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (9)
D
e
l
e
t
e
 
s
e
r
i
e
s
:
 
T
h
e
 
D
E
L
E
T
E
 
q
u
e
r
y
 
d
e
l
e
t
e
s
 
a
l
l
 
p
o
i
n
t
s
 
f
r
o
m
 
a
 
s
e
r
i
e
s
 
i
n
 
a
d
a
t
a
b
a
s
e
.
 
U
n
l
i
k
e
 
D
R
O
P
 
S
E
R
I
E
S
,
 
i
t
 
d
o
e
s
 
n
o
t
 
d
r
o
p
 
t
h
e
 
s
e
r
i
e
s
 
f
r
o
m
 
t
h
e
i
n
d
e
x
 
a
n
d
 
i
t
 
s
u
p
p
o
r
t
s
 
t
i
m
e
 
i
n
t
e
r
v
a
l
s
 
i
n
 
t
h
e
 
W
H
E
R
E
 
c
l
a
u
s
e
.
D
E
L
E
T
E
 
F
R
O
M
 
<
m
e
a
s
u
r
e
m
e
n
t
_
n
a
m
e
>
 
W
H
E
R
E
 
[
<
t
a
g
_
k
e
y
>
=
'
<
t
a
g
_
v
a
l
u
e
>
'
]
 
|
[
<
t
i
m
e
 
i
n
t
e
r
v
a
l
>
]
E
x
a
m
p
l
e
:
 
D
E
L
E
T
E
 
F
R
O
M
 
"
e
p
l
4
2
1
"
 
W
H
E
R
E
 
"
u
s
e
r
"
 
=
 
'
F
l
a
s
h
'
*
A successful DELETE query returns an empty result.
76
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (10)
D
r
o
p
 
m
e
a
s
u
r
e
m
e
n
t
:
 
T
h
e
 
D
R
O
P
 
M
E
A
S
U
R
E
M
E
N
T
 
q
u
e
r
y
 
d
e
l
e
t
e
s
 
a
l
l
d
a
t
a
 
a
n
d
 
s
e
r
i
e
s
 
f
r
o
m
 
t
h
e
 
s
p
e
c
i
f
i
e
d
 
m
e
a
s
u
r
e
m
e
n
t
 
a
n
d
 
d
e
l
e
t
e
s
 
t
h
e
m
e
a
s
u
r
e
m
e
n
t
 
f
r
o
m
 
t
h
e
 
i
n
d
e
x
.
DROP MEASUREMENT <measurement_name>
E
x
a
m
p
l
e
:
 
D
R
O
P
 
M
E
A
S
U
R
E
M
E
N
T
 
e
p
l
4
2
1
"
*A successful DROP MEASUREMENT query returns an empty result.
77
https://www.cs.ucy.ac.cy/courses/EPL421
InfluxDB tutorial (11)
C
r
e
a
t
e
 
r
e
t
e
n
t
i
o
n
 
p
o
l
i
c
y
:
 
C
R
E
A
T
E
 
R
E
T
E
N
T
I
O
N
 
P
O
L
I
C
Y
 
<
r
e
t
e
n
t
i
o
n
_
p
o
l
i
c
y
_
n
a
m
e
>
 
O
N
<
d
a
t
a
b
a
s
e
_
n
a
m
e
>
 
D
U
R
A
T
I
O
N
 
<
d
u
r
a
t
i
o
n
>
 
R
E
P
L
I
C
A
T
I
O
N
 
<
n
>
 
[
S
H
A
R
D
 
D
U
R
A
T
I
O
N
<
d
u
r
a
t
i
o
n
>
]
 
[
D
E
F
A
U
L
T
]
E
x
a
m
p
l
e
:
 
C
R
E
A
T
E
 
R
E
T
E
N
T
I
O
N
 
P
O
L
I
C
Y
 
"
o
n
e
_
d
a
y
_
o
n
l
y
"
 
O
N
 
"
m
y
d
b
"
 
D
U
R
A
T
I
O
N
 
1
d
 
R
E
P
L
I
C
A
T
I
O
N
 
1
T
h
e
 
D
U
R
A
T
I
O
N
 
c
l
a
u
s
e
 
d
e
t
e
r
m
i
n
e
s
 
h
o
w
 
l
o
n
g
 
I
n
f
l
u
x
D
B
 
k
e
e
p
s
 
t
h
e
 
d
a
t
a
.
 
T
h
e
 
<
d
u
r
a
t
i
o
n
>
 
i
s
 
a
 
d
u
r
a
t
i
o
n
l
i
t
e
r
a
l
 
o
r
 
I
N
F
 
(
i
n
f
i
n
i
t
e
)
.
 
T
h
e
 
m
i
n
i
m
u
m
 
d
u
r
a
t
i
o
n
 
f
o
r
 
a
 
r
e
t
e
n
t
i
o
n
 
p
o
l
i
c
y
 
i
s
 
o
n
e
 
h
o
u
r
 
a
n
d
 
t
h
e
 
m
a
x
i
m
u
m
d
u
r
a
t
i
o
n
 
i
s
 
I
N
F
.
T
h
e
 
R
E
P
L
I
C
A
T
I
O
N
 
c
l
a
u
s
e
 
d
e
t
e
r
m
i
n
e
s
 
h
o
w
 
m
a
n
y
 
i
n
d
e
p
e
n
d
e
n
t
 
c
o
p
i
e
s
 
o
f
 
e
a
c
h
 
p
o
i
n
t
 
a
r
e
 
s
t
o
r
e
d
 
i
n
 
t
h
e
c
l
u
s
t
e
r
.
M
o
d
i
f
y
 
r
e
t
e
n
t
i
o
n
 
p
o
l
i
c
y
:
 
A
L
T
E
R
 
R
E
T
E
N
T
I
O
N
 
P
O
L
I
C
Y
 
<
r
e
t
e
n
t
i
o
n
_
p
o
l
i
c
y
_
n
a
m
e
>
 
O
N
 
<
d
a
t
a
b
a
s
e
_
n
a
m
e
>
D
U
R
A
T
I
O
N
 
<
d
u
r
a
t
i
o
n
>
 
R
E
P
L
I
C
A
T
I
O
N
 
<
n
>
 
S
H
A
R
D
 
D
U
R
A
T
I
O
N
 
<
d
u
r
a
t
i
o
n
>
 
D
E
F
A
U
L
T
B
y
 
d
e
f
a
u
l
t
,
 
t
h
e
 
r
e
p
l
i
c
a
t
i
o
n
 
f
a
c
t
o
r
 
n
 
u
s
u
a
l
l
y
 
e
q
u
a
l
s
 
t
h
e
 
n
u
m
b
e
r
 
o
f
 
d
a
t
a
 
n
o
d
e
s
.
 
H
o
w
e
v
e
r
,
 
i
f
 
y
o
u
 
h
a
v
e
 
f
o
u
r
 
o
r
 
m
o
r
e
 
d
a
t
a
 
n
o
d
e
s
,
 
t
h
e
 
d
e
f
a
u
l
t
 
r
e
p
l
i
c
a
t
i
o
n
 
f
a
c
t
o
r
 
n
 
i
s
 
3
.
78
https://www.cs.ucy.ac.cy/courses/EPL421
References
https://medium.com/schkn/4-best-time-series-databases-to-watch-in-2019-ef1e89a72377
https://medium.com/schkn/sql-is-dead-hail-to-flux-8e8498756049
https://victoriametrics.com
https://medium.com/@valyala/insert-benchmarks-with-inch-influxdb-vs-victoriametrics-e31a41ae2893
https://blog.usejournal.com/open-sourcing-victoriametrics-f31e34485c2b
https://medium.com/@valyala/measuring-vertical-scalability-for-time-series-databases-in-google-cloud-
92550d78d8ae
https://github.com/influxdata/influxdb/tree/1.8
https://www.irondb.io/2018/08/tsdbs-at-scale-part-one/
79
https://www.cs.ucy.ac.cy/courses/EPL421
References
https://www.influxdata.com/?source=post_page-----ef1e89a72377----------------------
https://blog.timescale.com/blog/sql-vs-flux-influxdb-query-language-time-series-database-290977a01a8a/
https://blog.timescale.com/blog/sql-vs-flux-influxdb-query-language-time-series-database-290977a01a8a/
https://medium.com/schkn/sql-is-dead-hail-to-flux-8e8498756049
https://www.influxdata.com/blog/influxdb-outperforms-graphite-in-time-series-data-metrics-benchmark/
https://medium.com/@valyala/high-cardinality-tsdb-benchmarks-victoriametrics-vs-timescaledb-vs-influxdb-
13e6ee64dd6b
https://www.slant.co/versus/6000/33519/~influxdb_vs_victoriametrics
https://www.techrepublic.com/article/why-time-series-databases-are-exploding-in-popularity/
80
https://www.cs.ucy.ac.cy/courses/EPL421
References (implementation)
https://docs.influxdata.com/influxdb/v1.7/introduction/installation/
https://anyplace.cs.ucy.ac.cy
https://github.com/dmsl/anyplace/tree/master/server
https://github.com/Solliet/anyplace
https://www.couchbase.com/downloads
https://en.wikipedia.org/wiki/Geohash
https://github.com/Solliet/geohash_sisiphus
81
https://www.cs.ucy.ac.cy/courses/EPL421
Thank you for your attention!
Any questions?
82
https://www.cs.ucy.ac.cy/courses/EPL421
Slide Note
Embed
Share

This content delves into the intricacies of time series databases within the realm of systems programming, comparing options like InfluxDB. It discusses relational vs. tagset data models, setup procedures, API usage, security considerations, advantages and disadvantages of relational data models, and more.

  • Time Series Databases
  • Systems Programming
  • Relational Data Model
  • InfluxDB
  • Database Management

Uploaded on Sep 20, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. EPL421: Systems Programming Time Series Databases . By: Kyriakou Stefanos (skyria12 AT cs.ucy.ac.cy) Leontiou Panayiotis (pleont02 AT cs.ucy.ac.cy) Tymvios Stelios (stymvi01 AT cs.ucy.ac.cy) 1 https://www.cs.ucy.ac.cy/courses/EPL421

  2. Main presentation topics Relational vs Tagset data model What is a time series database and some of the current available options InfluxDB compared to other time series databases Setup of Couchbase, anyplace and influxDB Api Usage Security issues InfluxDB Tutorial https://www.cs.ucy.ac.cy/courses/EPL421 2

  3. Relational data model The database management system is responsible for describing the data structures, for storing data and retrieval procedures are responsible for answering queries Most relational databases use the SQL data definition and query language Each time-series measurement is recorded in its own row, with a time field followed by any number of other fields Fields support various and more complex data types Create indexes on any field or on multiple fields Any of these fields can be used as a foreign key to secondary tables https://www.cs.ucy.ac.cy/courses/EPL421 3

  4. https://www.cs.ucy.ac.cy/courses/EPL421 4

  5. Advantages of Relation data model A narrow or wide table, based on how much data and metadata we want to record per reading Many indexes to speed up queries or few indexes to reduce disk usage Denormalized metadata within the measurement row, or normalized metadata that lives in a separate table A rigid schema that validates input types or a schemaless JSON blob to increase iteration speed Check constraints that validate inputs, for instance checking for uniqueness or non- null values https://www.cs.ucy.ac.cy/courses/EPL421 5

  6. Disadvantages of Relation data model Need to select a schema and explicitly decide whether or not to use indexes https://www.cs.ucy.ac.cy/courses/EPL421 6

  7. Tagset data model Each measurement has a timestamp, an associated set of tags (tagset) and a set of fields (fieldset) The fieldset represents the actual measurement data values. The tagset represents the metadata to describe the measurements Field data types are limited to floats, ints, strings, and booleans, and cannot be changed without rewriting the data Tagset values are always represented as strings and cannot be updated Tagset values are indexed while fieldset values are not https://www.cs.ucy.ac.cy/courses/EPL421 7

  8. https://www.cs.ucy.ac.cy/courses/EPL421 8

  9. Advantages of tagset data model Very easy to get started No need to create schemas or indexes https://www.cs.ucy.ac.cy/courses/EPL421 9

  10. Disadvantages of Tagset data model Rigid and limited, with no option to create any additional indexes The underlying schema is auto-generated based on the input data, which may differ from the desired schema https://www.cs.ucy.ac.cy/courses/EPL421 10

  11. What is a Time series database (TSDB)? A software system that is optimized for storing and serving time series through associated pairs of time and values. https://www.cs.ucy.ac.cy/courses/EPL421 11

  12. Why time series DBs are exploding in popularity? Enterprises want to be able to query, analyze and create reports based on streaming data in real-time, instead of batch mode. Over past years, time series databases have exploded in popularity, according to database engines data https://www.cs.ucy.ac.cy/courses/EPL421 12

  13. No database type has grown faster in popularity than time series DBs https://www.cs.ucy.ac.cy/courses/EPL421 13

  14. According to Timescale CEO Ajay Kulkarni Time-series datasets track changes to the overall system as INSERTS, not UPDATES. What makes time-series data so powerful is the fact that they record each and every change to the system as a new different row. Time-series databases introduce the ability to analyze how something changed in the past. In addition they can be monitored to see how something is changing in the present, or even to make predicting about how it may change in the future https://www.cs.ucy.ac.cy/courses/EPL421 14

  15. Choosing a time series DB Check what they offer and if those fit your needs Should be OpenSource No data size limitations Free to use DB Fast and efficient https://www.cs.ucy.ac.cy/courses/EPL421 15

  16. Some of the available options InfluxDB Graphite TimescaleDB OpenTSDB VictoriaMetrics https://www.cs.ucy.ac.cy/courses/EPL421 16

  17. InfluxDB Date of birth: 2013 Ranked in the first place in the last 3 years (2017-2019) Completely open-source time series database Working on all current operating systems Supports a very large set of programming languages Optimized for heavy writing load Works amazingly well with concurrency Schema-Free database https://www.cs.ucy.ac.cy/courses/EPL421 17

  18. Why choose InfluxDB? Super easy to install, configure and launch As a NoSQL-like database, you don t have to setup your database InfluxData provides a visualization tool Uses Flux, a new processing language, which is becoming a new tech trend, or an SQL-like language (it can also be used with HTTP requests) Gives more power to the user but at the same time reduces the power of the database Stores data in LSM trees, which are better suited for storing time series data comparing to general-purpose storage provided by Postgresql Drawbacks: 1. No same-time insert 2. Poor performance for deletion with predicates https://www.cs.ucy.ac.cy/courses/EPL421 18

  19. Graphite Very widely used time series database system Powerful monitoring tool that stores numeric time series data Can display the stored data on demand via its Graphite-web interface at a fair speed Most of the time used as a system, network and application performance metric store Big companies such as Booking.com, Reddit and GitHub use it on a daily basis to be able to easily detect outage on their architecture https://www.cs.ucy.ac.cy/courses/EPL421 19

  20. Why choose Graphite? Built to deal with numeric data Graphite Web is an interface for developers to monitor their application Connects with a lot of tools natively Makes it easy for developers to connect with their existing infrastructure https://www.cs.ucy.ac.cy/courses/EPL421 20

  21. https://www.cs.ucy.ac.cy/courses/EPL421 21

  22. https://www.cs.ucy.ac.cy/courses/EPL421 22

  23. https://www.cs.ucy.ac.cy/courses/EPL421 23

  24. TimescaleDB Open-sourced Based on SQL premises Very large set of supported programming languages Directly tied with PostgresSQL Offers a unique set of time series related operations (like fast ingest) https://www.cs.ucy.ac.cy/courses/EPL421 24

  25. Why choose TimescaleDB? Supports the SQL language natively No need to learn a new language Big companies rely on SQL-constraint systems in order to ensure system reliability and accessibility Drawbacks: 1. Quickly reaches disk bandwidth limit, which can be lifted by using more expensive disks with higher read / write bandwidth such as high-end SSDs 2. Requires much more storage space comparing to VictoriaMetrics and InfluxDB for storing the same amount of data points https://www.cs.ucy.ac.cy/courses/EPL421 25

  26. https://www.cs.ucy.ac.cy/courses/EPL646 https://www.cs.ucy.ac.cy/courses/EPL421 26

  27. https://www.cs.ucy.ac.cy/courses/EPL421 27

  28. OpenTSDB Able to store hundreds of billions of data rows over distributed instances of TSD servers Schema free database built on Apache HBase HBase is a non-relational management system written to handle big tables storage in an elegant and efficient way https://www.cs.ucy.ac.cy/courses/EPL421 28

  29. Why choose OpenTSDB? Can handle several millions writes per second Better performance than InfluxDB, when dealing with more than one million writes per second. OpenTSDB integrates with Cassandra, BigTable, CollectD, StatsD, Chef and even Puppet for deployment management https://www.cs.ucy.ac.cy/courses/EPL421 29

  30. https://www.cs.ucy.ac.cy/courses/EPL421 30

  31. https://www.cs.ucy.ac.cy/courses/EPL421 31

  32. https://www.cs.ucy.ac.cy/courses/EPL421 32

  33. VictoriaMetrics Supports native PromQL (doesn t support SQL) Supports wide range of retention periods starting from 1 month Compresses on-disk data better than competitors (according to their website), which means it can handle longer retentions without downsampling Excels on heavy queries over thousands of metrics with millions of data points Open Source under Apache2 license https://www.cs.ucy.ac.cy/courses/EPL421 33

  34. Why choose VictoriaMetrics? Requires fewer hardware resources (RAM, CPU, storage) which allows for saving hardware costs Outperforms InfluxDB and TimescaleDB on data ingestion VictoriaMetrics has the best optimization for disk IO bandwidth usage, compared to InfluxDB and TimescaleDB. VictoriaMetrics provides better vertical scalability for both data ingestion and querying, compared to InfluxDB and TimescaleDB Stores data in LSM trees, which are better suited for storing time series data comparing to general-purpose storage provided by Postgresql Drawbacks: It is a relatively new database, which was written from scratch and may contain unpolished code https://www.cs.ucy.ac.cy/courses/EPL421 34

  35. https://www.cs.ucy.ac.cy/courses/EPL421 35

  36. https://www.cs.ucy.ac.cy/courses/EPL421 36

  37. RAM usage for various cardinalities https://www.cs.ucy.ac.cy/courses/EPL421 37

  38. Our mission Install and configure couchbase Install and configure anyplace Install and configure influxDB Create API endpoints to connect anyplace with influxDB https://www.cs.ucy.ac.cy/courses/EPL421 38

  39. What is Anyplace? A free and open Indoor Navigation Service with excellent accuracy A first-of-a-kind indoor information service offering GPS-less localization, navigation and search inside buildings using ordinary smartphones https://www.cs.ucy.ac.cy/courses/EPL421 39

  40. Awards 2018 - Best Demo Award 19th IEEE International Conference on Mobile Data Management June 26 - June 28, 2018, Aalborg, Denmark. 2017 - Honorable Mention Award 18th IEEE International Conference on Mobile Data Management May 29 - June 1, 2017, KAIST, Daejeon, South Korea. 2014 - 1st place at Evaluation of RF-based Indoor Localization Solutions for the Future Internet (EVARILOS Open Challenge), European Union, Berlin, Germany 2014 - 2nd place at Microsoft Research Indoor Localization Competition at IEEE/ACM IPSN 2014, Berlin, Germany. 2012 - Best Demo Award at IEEE Mobile Data Management Conference, Bangalore, India. https://www.cs.ucy.ac.cy/courses/EPL421 40

  41. Anyplace architecture https://www.cs.ucy.ac.cy/courses/EPL421 41

  42. Anyplace architecture https://www.cs.ucy.ac.cy/courses/EPL421 42

  43. Scala Object-oriented and functional programming high-level language Scala's static types help avoid bugs in complex applications Its JVM and JavaScript runtimes gives the ability to users to build high-performance systems and gives access to huge ecosystems of libraries https://www.cs.ucy.ac.cy/courses/EPL421 43

  44. Play framework Lightweight, stateless, web-friendly architecture Uses Akka and Akka Streams under the covers to provide predictable and minimal resource consumption (CPU, memory, threads) Akka and Akka Streams abstract away from the imperative nature of how the data is inputted into the application giving us a declarative way of describing, handling it and hiding details that we don t care about. Streaming helps you ingest, process, analyze, and store data in a quick and responsive manner. https://www.cs.ucy.ac.cy/courses/EPL421 44

  45. What is couchbase? An open-source, distributed multi-model NoSQL document- oriented database software package that is optimized for interactive applications. Designed to provide easy-to-scale key-value or JSON document access with low latency and high sustained throughput. Designed to be clustered from a single machine to very large- scale deployments using many machines. https://www.cs.ucy.ac.cy/courses/EPL421 45

  46. Installation & configuration of Couchbase Install Couchbase: curl -O https://packages.couchbase.com/releases/couchbase- release/couchbase-release-1.0-6-amd64.deb sudo dpkg -i ./couchbase-release-1.0-6-amd64.deb sudo apt-get update sudo apt-get install couchbase-server-community Please note that you have to update your firewall configuration to allow connections to the following ports: 4369, 8091 to 8094, 9100 to 9105, 9998, 9999, 11209 to 11211, 11214, 11215, 18091 to 18093, and from 21100 to 21299. https://www.cs.ucy.ac.cy/courses/EPL421 46

  47. Installation & configuration of Couchbase Start couchbase: sudo service couchbase-server start Stop couchbase: sudo service couchbase-server stop Check status: sudo service couchbase-server status https://www.cs.ucy.ac.cy/courses/EPL421 47

  48. Installation & configuration of Couchbase Configuration: Visit the below address to configure Couchbase http://localhost:8091/ *Make sure that port 8091 is open https://www.cs.ucy.ac.cy/courses/EPL421 48

  49. Installation & configuration of anyplace Install: wget https://anyplace.cs.ucy.ac.cy/downloads/any place_v3.zip unzip anyplace_v3.zip https://www.cs.ucy.ac.cy/courses/EPL421 49

  50. Installation & configuration of anyplace Configuration: Edit configuration file under anyplace folder vim conf/application.conf Edit the following fields accordingly: (all must be in double quotes except port numbers) application.secret=< This is a Play Framework parameter > couchbase.hostname=< Default is "http://localhost"> couchbase.port=< Default is 8091> couchbase.bucket=< Name of couchbase bucket, must be the same with username > couchbase.username=< Username for couchbase database > couchbase.password=< Password for couchbase database > influxdb.hostname=< Default is "http://localhost"> influxdb.port=< Default is 8086 > influxdb.database=< Name of influxDB database > https://www.cs.ucy.ac.cy/courses/EPL421 50

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#