Outline of Q&A Themes and Groups
Beginner
- Where can you find Impala? Answer
- How many ways are there to get a command prompt for Impala? Answer
- What do you need to know to get around in impala-shell? Answer
- How many databases are there in your Impala instance? Answer
- How do you approach a table you don't understand very well? Answer
- How do I get data into a table? Answer
Extract, Transform, Load (ETL) Considerations
- What does the ETL cycle look like? Answer
- How is X, Y, or Z represented in a text-format table? (Numbers, Booleans, date/times, NULLs.) Answer
- What is the smallest non-zero amount of data you could possibly put into an Impala table? Answer
- How could I synthesize some data to use? Answer
- How do you deal with input text files in varied or inconsistent formats? Answer
- What are all the kinds of nonexistent or special values? Answer
- What are the ways data files could be laid out? Answer
- What are the considerations for file formats? Answer
Hadoop and Big Data
- What are the considerations for file formats? Answer
- How can I interface with Impala data through other Hadoop tools? Answer
- Discuss what's really happening inside aggregation function calls. How does GROUP BY change things? Answer
- Isn't it great, I have this table with a million rows! Answer
- What are ramifications of cluster topology? Answer
- What are some of the ways a table can be "big"? Answer
- What are some ways "big data" use cases differ from traditional database work? Answer
- What are the differences between my dev/test system and a production environment? Answer
- What are the implications of normalized vs. denormalized data? Answer
- What are the ramifications as cluster size increases? Answer
- What problems could arise if the data is too small? Answer
- What's wrong with SELECT * FROM t1 in a Hadoop / Big Data context? Answer
- Why would you use Impala vs. Hive? Answer
Migrating or Porting from Other Databases
- Where is my favorite feature from RDBMS X? Transactions, triggers, constraints... Answer
- How about indexes, where did those go? Answer
- What are the number ranges for Impala data types? Answer
- What things could trip you up with numbers? Answer
- What things could trip you up with strings? Answer
- What things could trip you up with dates and times? Answer
- What things could trip you up with sorting? Answer
- How do you know a column is unique? No nulls? Only contains a specific set of values? Answer
SQL Tips and Tricks
- How do I make a table be in a specific database? Answer
- How many ways could you have a table with no data (0 rows) in Impala? Answer
- How do I empty out an existing table? Answer
- What kinds of data types can I use in tables? Answer
- You have data with dates. How can you partition by year, month, and day? Answer
- How do I work with different date formats? Answer
- How do you recover if you made a mistake in your schema definition? (Schema evolution.) Answer
- How many ways could you use Impala to calculate 2+2? Answer
- In what circumstances are NULLs allowed in the schema definition? Answer
- What are all the places where you have to be careful about reserved words? Answer
- What are all the things you can quote and how to quote them? Identifiers, Literals
- When are you allowed or required to alias things? Answer
- When do I have to cast data types? Answer
- You have a table T1 with N rows. How do you control how many rows are in the result set? Answer
- You have a table with N columns. How do you control how many columns are in the result set? Answer
- What analytic functions does Impala have? Answer
- How do I script Impala from the shell? Answer
- You enter SELECT COUNT(*) FROM t1 and get an "unknown table" error. Why could that be? Answer
- Why could you access a column in a table today but not tomorrow? Answer
Performance and Scalability
- What are the considerations for file formats? Answer
- What are all the ways I can understand the performance and scalability of a query? Answer
- How are joins different or special for Hadoop versus traditional databases? Answer
- I run a query and it seems slow. How do I figure out what to do? Answer
- INSERT INTO t1 SELECT * FROM raw_data - What performance and scalability considerations are there for a copy operation like this? Answer
- What are some trouble signs to look for in super-complicated queries? Answer
- What are the ramifications of compression? Answer
- What could be bottlenecks slowing down a big query? Answer