Questions tagged [impala]
Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.
2,077
questions
58
votes
5
answers
37k
views
How does impala provide faster query response compared to hive
I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared to Hive for the queries I ...
43
votes
2
answers
35k
views
Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)
I want to do some "near real-time" data analysis (OLAP-like) on the data in a HDFS.
My research showed that the three mentioned frameworks report significant performance gains compared to Apache Hive. ...
27
votes
4
answers
82k
views
How to copy all hive table from one Database to other Database
I have default db in hive table which contains 80 tables .
I have created one more database and I want to copy all the tables from default DB to new Databases.
Is there any way I can copy from One DB ...
21
votes
3
answers
24k
views
Impala can't access all hive table
I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hive ...
16
votes
3
answers
43k
views
Difference between invalidate metadata and refresh commands in Impala?
I saw at this link which affects Impala version 1.1:
Since Impala 1.1, REFRESH statement only works for existing tables. For new tables you need to issue "INVALIDATE METADATA" statement.
Does this ...
14
votes
1
answer
22k
views
How to calculate seconds between two timestamps in Impala?
I do not see an Impala function to subtract two datestamps and return seconds (or minutes) between the two.
http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/...
14
votes
2
answers
2k
views
how to efficiently move data from Kafka to an Impala table?
Here are the steps to the current process:
Flafka writes logs to a 'landing zone' on HDFS.
A job, scheduled by Oozie, copies complete files from the landing zone to a staging area.
The staging data ...
12
votes
1
answer
2k
views
How to efficiently update Impala tables whose files are modified very frequently
We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files in ...
12
votes
7
answers
20k
views
RODBC ERROR: Could not SQLExecDirect in mysql
I have been trying to write an R script to query Impala database. Here is the query to the database:
select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from ...
11
votes
3
answers
73k
views
Convert YYYYMMDD String to Date in Impala
I'm using SQL in Impala to write this query. I'm trying to convert a date string, stored in YYYYMMDD format, into a date format for the purposes of running a query like this:
SELECT datadate,
...
11
votes
3
answers
13k
views
How does computing table stats in hive or impala speed up queries in Spark SQL?
For increasing performance (e.g. for joins) it is recommended to compute table statics first.
In Hive I can do::
analyze table <table name> compute statistics;
In Impala:
compute stats <...
11
votes
2
answers
15k
views
Write pandas table to impala
Using the impyla module, I've downloaded the results of an impala query into a pandas dataframe, done analysis, and would now like to write the results back to a table on impala, or at least to an ...
11
votes
2
answers
1k
views
Big data signal analysis: better way to store and query signal data
I am about doing some signal analysis with Hadoop/Spark and I need help on how to structure the whole process.
Signals are now stored in a database, that we will read with Sqoop and will be ...
11
votes
1
answer
1k
views
Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?
We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does ...
9
votes
1
answer
10k
views
Impala command to know DB table size
Is there any way that we can check the DB table size and other properties ? I tried COMPUTE STATS but it gives the details of table except the size.
any link to find information and other details are ...
9
votes
2
answers
12k
views
Create table from CSV with values containing commas enclosed in quotes
I'm trying to create a table in Impala from a CSV that I've uploaded into an HDFS directory. The CSV contains values with commas enclosed inside quotes.
Example:
1.66.96.0/19,"NTT Docomo,INC.","...
9
votes
2
answers
458
views
Is there a way to turn off DESCRIBE in R dplyr sql
I'm using R shiny and dplyr to connect to a database and query the data in Impala. I do the following.
con <- dbPool(odbc(),
Driver = [DIVER],
Host = [HOST],
Schema = [SCHEMA],
Port = [PORT],
UID =...
8
votes
2
answers
10k
views
How do I set a variable in an Impala query using HUE?
I need to add parameters in several locations in a long query. I want to use parameters because I need to run the query multiple times with different values substituted in. This is very cumbersome ...
8
votes
3
answers
41k
views
Get sequential number of a row (rank) within a partition without using ROW_NUMBER() OVER function
I need to rank rows by partition (or group), i.e. if my source table is:
NAME PRICE
---- -----
AAA 1.59
AAA 2.00
AAA 0.75
BBB 3.48
BBB 2.19
BBB 0.99
BBB 2.50
I would like to get target table:
...
8
votes
1
answer
342
views
Running impala cluster from portable binaries
I'm evaluating multiple big data tools. One of them is of course Impala.
I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O,...
7
votes
2
answers
11k
views
How to duplicate cloudera impala table with impala-shell or other means?
I see a table "test" in Impala when I do show tables;
I want to make a copy of the "test" table so that it is an exact duplicate, but named "test_copy". Is there a impala query I can execute to do ...
7
votes
1
answer
35k
views
Dropping multiple partitions in Impala/Hive
1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ':
ALTER TABLE cz_prd_corrti_st....
7
votes
1
answer
23k
views
Difference in days between two dates in Impala
I am trying to find a date difference In Impala. I have tried a few options. my most recent is below
ABS(dayofyear(CAST(firstdate AS TIMESTAMP)-dayofyear(CAST(seconddate AS TIMESTAMP)
an example of ...
7
votes
4
answers
48k
views
ROW_NUMBER( ) OVER in impala
I have a use case where I need to use ROW_NUMBER() over PARTITION:
Something like:
SELECT
Column1 , Column 2
ROW_NUMBER() OVER (
PARTITION BY ACCOUNT_NUM
ORDER BY FREQ, MAN, MODEL) as ...
7
votes
5
answers
10k
views
Will Spark SQL completely replace Apache Impala or Apache Hive? [closed]
I need to deploy Big Data Cluster on our servers. But I just know about knowledge of Apache Spark. Now I need to know whether Spark SQL can completely replace Apache Impala or Apache Hive.
I need ...
7
votes
2
answers
28k
views
extract the date from a timestamp value variable in Impala
How can I extract the date from a timestamp value variable in Impala?
eg time = 2018-04-11 16:05:19 should be 2018-04-11
7
votes
2
answers
14k
views
Uploading CSV for Impala
I am trying to upload the csv file on HDFS for Impala and failing many time. Not sure what is wrong here as I have followed the guide. And the csv is also on HDFS.
CREATE EXTERNAL TABLE gc_imp
...
7
votes
2
answers
41k
views
Impala: Show tables like query
I am working with Impala and fetching the list of tables from the database with some pattern like below.
Assume i have a Database bank, and tables under this database are like below.
cust_profile
...
7
votes
3
answers
5k
views
How to find the COMPRESSION_CODEC used on a Parquet file at the time of its generation?
Usually in Impala, we use the COMPRESSION_CODEC before inserting data into a table for which the underlying files are in Parquet format.
Commands used to set COMPRESSION_CODEC:
set ...
7
votes
1
answer
4k
views
Impala cannot find com.mysql.jdbc.Driver
I'm trying to set up Cloudera Impala with CDH4 in pseudo distributed mode on Red Hat 5. I have Hive using JDBC to connect to a MySQL metastore, but I'm having trouble setting up Impala with JDBC. I've ...
6
votes
2
answers
8k
views
Installing cloudera impala without cloudera manager
Kindly provide the link for installing the imapala in ubuntu without cloudera manager. Couldn't able to install with official link.
Unable to locate package impala using these queries :
sudo apt-...
6
votes
1
answer
13k
views
Calling JDBC to impala/hive from within a spark job and creating a table
I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations.
How do I do this? Any example would be of great ...
6
votes
1
answer
11k
views
Impala - convert existing table to parquet format
I have a table that has partitions and I use avro files or text files to create and insert into a table.
Once the table is done, is there a way to convert into parquet.
I mean I know we could have ...
6
votes
3
answers
14k
views
Save Impala Shell query results in CSV
How can I save my query results in a CSV file via the Impala Shell.
My Code:
impala-shell -q "use test;
select * from teams;
-- From this point I need to save the query results to /Desktop (for ...
6
votes
4
answers
13k
views
Comma delimited string to individual rows - Impala SQL
Let's suppose we have a table:
Owner | Pets
------------------------------
Jack | "dog, cat, crocodile"
Mary | "bear, pig"
I want to get as a result:
Owner | Pets
------------------------...
6
votes
2
answers
5k
views
Performance of Apache Drill
Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. ...
6
votes
1
answer
21k
views
Jdbc settings for connecting to Impala
What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)?
I have tried a few including:
jdbc:hive2://myserver:21050/;auth=noSasl
And with the following driver:
org.apache....
6
votes
3
answers
8k
views
Custom SerDe not supported by Impala, what's the best way to query files in CSV w/double quotes?
I have a CSV data with each field surronded with double quotes. When I created
Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde'
When above table is queried in Impala I am getting error SerDe ...
6
votes
2
answers
469
views
Immediate evaluation of CTE
I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct ...
6
votes
1
answer
11k
views
Impala/Hive to get list of tables along with its size
I have used a query in Oracle DB to produce the list of tables in a database along with its owner and respective table size. Here is the sample query i have shared.
select owner, table_name, round((...
6
votes
2
answers
34k
views
How to set configuration in Hive-Site.xml file for hive metastore connection?
I want to connect MetaStore using the java code. I have no idea how to set configuration setting in Hive-Site.xml file and where I'll post the Hive-Site.xml file. Please help.
import java.sql....
6
votes
1
answer
11k
views
Implement CREATE AS SELECT in Impala
Pls help me on how to implement CREATE TABLE AS SELECT
For simple create table t1 as select * from t2; I can implement as
Create table t1 like t2;
insert into t1 as select * from t2;
But how to ...
6
votes
0
answers
1k
views
How to use Impala to read Hive view containing complex types?
I have some data that is processed and model based on case classes, and the classes can also have other case classes in them, so the final table has complex data, struct, array. Using the case class I ...
5
votes
4
answers
6k
views
Presto vs Impala: architecture, performance, functionality
Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance?
For some reason this excellent question was tagged as ...
5
votes
3
answers
12k
views
Invalidate metadata/refresh imapala from spark code
I'm working on a NRT solution that requires me to frequently update the metadata on an Impala table.
Currently this invalidation is done after my spark code has run.
I would like to speed things up ...
5
votes
3
answers
8k
views
Cloudera Impala INVALIDATE METADATA
As has been discussed in impala tutorials, Impala uses a Metastore shared by Hive. but has been mentioned that if you create or do some editions on tables using hive, you should execute INVALIDATE ...
5
votes
2
answers
12k
views
Is there a way to show partitions on Cloudera impala?
Normally, I can do show partitions <table> in hive. But when it is a parquet table, hive does not understand it. I can go to hdfs and check the dir structure, but that is not ideal. Is there any ...
5
votes
2
answers
2k
views
Hive/Impala performance with string partition key vs Integer partition key
Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions?
5
votes
3
answers
4k
views
Load large csv in hadoop via Hue would only store a 64MB block
Im using the Cloudera quickstart vm 5.1.0-1
Im trying to load my 3GB csv in Hadoop via Hue and what I tried so far is:
- Load the csv into the HDFS and specifically into a folder called datasets ...
5
votes
2
answers
2k
views
Impala on Hadoop 2.2.0 without CDH?
I want to test and configure Impala with my Hadoop 2.2.0 distribution, not Cloudera ones.
I want to know if its possible to use Impala without CDH, because I only read that Impala is CDH dependent.
...