Outline of Q&A Themes and Groups

Beginner

What does the ETL cycle look like? Answer
How is X, Y, or Z represented in a text-format table? (Numbers, Booleans, date/times, NULLs.) Answer
What is the smallest non-zero amount of data you could possibly put into an Impala table? Answer
How could I synthesize some data to use? Answer
How do you deal with input text files in varied or inconsistent formats? Answer
What are all the kinds of nonexistent or special values? Answer
What are the ways data files could be laid out? Answer
What are the considerations for file formats? Answer

What are the considerations for file formats? Answer
How can I interface with Impala data through other Hadoop tools? Answer
Discuss what's really happening inside aggregation function calls. How does GROUP BY change things? Answer
Isn't it great, I have this table with a million rows! Answer
What are ramifications of cluster topology? Answer
What are some of the ways a table can be "big"? Answer
What are some ways "big data" use cases differ from traditional database work? Answer
What are the differences between my dev/test system and a production environment? Answer
What are the implications of normalized vs. denormalized data? Answer
What are the ramifications as cluster size increases? Answer
What problems could arise if the data is too small? Answer
What's wrong with SELECT * FROM t1 in a Hadoop / Big Data context? Answer
Why would you use Impala vs. Hive? Answer

Where is my favorite feature from RDBMS X? Transactions, triggers, constraints... Answer
How about indexes, where did those go? Answer
What are the number ranges for Impala data types? Answer
What things could trip you up with numbers? Answer
What things could trip you up with strings? Answer
What things could trip you up with dates and times? Answer
What things could trip you up with sorting? Answer
How do you know a column is unique? No nulls? Only contains a specific set of values? Answer

How do I make a table be in a specific database? Answer
How many ways could you have a table with no data (0 rows) in Impala? Answer
How do I empty out an existing table? Answer
What kinds of data types can I use in tables? Answer
You have data with dates. How can you partition by year, month, and day? Answer
How do I work with different date formats? Answer
How do you recover if you made a mistake in your schema definition? (Schema evolution.) Answer
How many ways could you use Impala to calculate 2+2? Answer
In what circumstances are NULLs allowed in the schema definition? Answer
What are all the places where you have to be careful about reserved words? Answer
What are all the things you can quote and how to quote them? Identifiers, Literals
When are you allowed or required to alias things? Answer
When do I have to cast data types? Answer
You have a table T1 with N rows. How do you control how many rows are in the result set? Answer
You have a table with N columns. How do you control how many columns are in the result set? Answer
What analytic functions does Impala have? Answer
How do I script Impala from the shell? Answer
You enter SELECT COUNT(*) FROM t1 and get an "unknown table" error. Why could that be? Answer
Why could you access a column in a table today but not tomorrow? Answer

What are the considerations for file formats? Answer
What are all the ways I can understand the performance and scalability of a query? Answer
How are joins different or special for Hadoop versus traditional databases? Answer
I run a query and it seems slow. How do I figure out what to do? Answer
INSERT INTO t1 SELECT * FROM raw_data - What performance and scalability considerations are there for a copy operation like this? Answer
What are some trouble signs to look for in super-complicated queries? Answer
What are the ramifications of compression? Answer
What could be bottlenecks slowing down a big query? Answer