Outline of Q&A Themes and Groups

Beginner

  1. Where can you find Impala? Answer
  2. How many ways are there to get a command prompt for Impala? Answer
  3. What do you need to know to get around in impala-shell? Answer
  4. How many databases are there in your Impala instance? Answer
  5. How do you approach a table you don't understand very well? Answer
  6. How do I get data into a table? Answer

Extract, Transform, Load (ETL) Considerations

  1. What does the ETL cycle look like? Answer
  2. How is X, Y, or Z represented in a text-format table? (Numbers, Booleans, date/times, NULLs.) Answer
  3. What is the smallest non-zero amount of data you could possibly put into an Impala table? Answer
  4. How could I synthesize some data to use? Answer
  5. How do you deal with input text files in varied or inconsistent formats? Answer
  6. What are all the kinds of nonexistent or special values? Answer
  7. What are the ways data files could be laid out? Answer
  8. What are the considerations for file formats? Answer

Hadoop and Big Data

  1. What are the considerations for file formats? Answer
  2. How can I interface with Impala data through other Hadoop tools? Answer
  3. Discuss what's really happening inside aggregation function calls. How does GROUP BY change things? Answer
  4. Isn't it great, I have this table with a million rows! Answer
  5. What are ramifications of cluster topology? Answer
  6. What are some of the ways a table can be "big"? Answer
  7. What are some ways "big data" use cases differ from traditional database work? Answer
  8. What are the differences between my dev/test system and a production environment? Answer
  9. What are the implications of normalized vs. denormalized data? Answer
  10. What are the ramifications as cluster size increases? Answer
  11. What problems could arise if the data is too small? Answer
  12. What's wrong with SELECT * FROM t1 in a Hadoop / Big Data context? Answer
  13. Why would you use Impala vs. Hive? Answer

Migrating or Porting from Other Databases

  1. Where is my favorite feature from RDBMS X? Transactions, triggers, constraints... Answer
  2. How about indexes, where did those go? Answer
  3. What are the number ranges for Impala data types? Answer
  4. What things could trip you up with numbers? Answer
  5. What things could trip you up with strings? Answer
  6. What things could trip you up with dates and times? Answer
  7. What things could trip you up with sorting? Answer
  8. How do you know a column is unique? No nulls? Only contains a specific set of values? Answer

SQL Tips and Tricks

  1. How do I make a table be in a specific database? Answer
  2. How many ways could you have a table with no data (0 rows) in Impala? Answer
  3. How do I empty out an existing table? Answer
  4. What kinds of data types can I use in tables? Answer
  5. You have data with dates. How can you partition by year, month, and day? Answer
  6. How do I work with different date formats? Answer
  7. How do you recover if you made a mistake in your schema definition? (Schema evolution.) Answer
  8. How many ways could you use Impala to calculate 2+2? Answer
  9. In what circumstances are NULLs allowed in the schema definition? Answer
  10. What are all the places where you have to be careful about reserved words? Answer
  11. What are all the things you can quote and how to quote them? Identifiers, Literals
  12. When are you allowed or required to alias things? Answer
  13. When do I have to cast data types? Answer
  14. You have a table T1 with N rows. How do you control how many rows are in the result set? Answer
  15. You have a table with N columns. How do you control how many columns are in the result set? Answer
  16. What analytic functions does Impala have? Answer
  17. How do I script Impala from the shell? Answer
  18. You enter SELECT COUNT(*) FROM t1 and get an "unknown table" error. Why could that be? Answer
  19. Why could you access a column in a table today but not tomorrow? Answer

Performance and Scalability

  1. What are the considerations for file formats? Answer
  2. What are all the ways I can understand the performance and scalability of a query? Answer
  3. How are joins different or special for Hadoop versus traditional databases? Answer
  4. I run a query and it seems slow. How do I figure out what to do? Answer
  5. INSERT INTO t1 SELECT * FROM raw_data - What performance and scalability considerations are there for a copy operation like this? Answer
  6. What are some trouble signs to look for in super-complicated queries? Answer
  7. What are the ramifications of compression? Answer
  8. What could be bottlenecks slowing down a big query? Answer