Here I’m talking to someone trying to break into “Data” as a career field in a small to medium sized company. This might mean moving from help desk or marketing or customer service to DBA, Data Analyst, or Data Engineer. These data jobs reflect a LOT of overlap when you work in smaller companies, as I have mostly done. The reality is you might find yourself called a Data Analyst, Data Engineer, or DBA. But more likely you’ll be something like a Business Intelligence Data Warehouse Analytics Engineer… or even a new-fangled “Analytics Engineer”.
Titles are both fluid and vague in small to midsize businesses, so YMMV. I was lucky enough to go from History major to Senior Data Engineer on a long and winding road, with stops along the way in Business Intelligence, DBA, DW engineering and others.
Most of this advice is geared toward the SQL Server ecosystem. First, because this is where I started. The company I was at used SQL Server RDBMS and its BI Tools. Second, SQL Server comes with “batteries included”. Meaning you get the database, a development environment (SSMS), a reporting tool (SSRS) and an ETL Tool (SSIS). I always keep a virtual machine (VMWARE workstation or VirtualBox) with SQL Server, SSRS and SSIS installed for learning purposes. This means I have everything I need to spend a Saturday learning and breaking things!
So here is a brief-ish synopsis of how, based on my experience, I might recommend someone else navigate this journey .
Learn SQL. Well.
Start out to learn some general principles of RDBMS (relational database management systems), and then after working on SQL skills, drill into the specifics of whatever RDBMS you are going to dig into. The Fundamentals book by Itzik Ben-Gan will provide you a brief and useful introduction. Don’t lose the forest for the trees at this stage… you are looking to get build some mental velcro so later knowledge can stick.
What should you be sure to learn? Joins, Analytic/Window functions, CTEs, subqueries, TVFs and stored procedures should all be on the menu. One helpful thing to keep in mind: There IS a difference between SQL QUERYING and SQL DEVELOPMENT. I don’t think this is a distinction without a difference. For analytics, most of what you are doing is figuring out the best way to get data OUT of the database. By “best” I mean 1) correct and accurate and 2) performant. Aka FAST! And fast means time to delivery as well as speed with which the query returns results.
This is opposed to SQL development, where you are designing structures with software in mind, focusing on OLTP type stuff with which customer facing apps might concern themselves. And OFTEN that focus is getting data INTO the application database as efficiently and effectively as possible. The line does start to blur as you move more into Data Engineering, but we can talk about that in another post.
- Just starting out, read TSQL Fundamentals, 3rd Edition. You will gain SO much from reading just the first 6 chapters. Of course you can’t just READ and hope to get anything out of it. You MUST do the exercises and get your hands dirty. You will be amazed how quickly you develop skills in this active manner.
- Assuming you are going to be doing some amount of reporting/analysis, go for TSQL Window Functions, 2nd Edition next. This book will give you the super power of things like LAG/LEAD, partitions, etc. You will buzz with god-like powers when you have gained some experience here.
- When you are moving along nicely, and you’ve been some useful reports and one-off queries, its time for TSQL Querying, Developer Reference. Big, bold and beneficial to use as reference for the rest of your career!
Learn one RDBMS. Well.
Understand the fundamentals and more of indexing in your RDBMS of choice. Again I am partial to SQL Server, so my recommendations reflect that fact. There is nothing better I have found than Brent Ozar’s training for both fundamental and advanced index strategies and tuning. To get started you can download SQL Server Developer edition, along with SSMS and SQL Server data tools here.
Focus on understanding the engine/optimizer. The engine is the brains of the RDBMS and determines the HOW of returning the data you are seeking. Your time will be well spent starting with this video by Brent Ozar about understanding how the SQL Server engine actually works.
Learn some query performance tuning. This means not only theory but the “how” of things like reading and understanding execution/sql plans, estimates, etc. Everything by Brent Ozar is great here as well. If you are more of a book person, a recent addition to the body of knowledge is this book by Benjamin Nevarez.
Get familiar with data warehousing concepts
There are a lot of resources on the interwebs, but the holy grail IMHO is the Dimensional Modeling bible called The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition. Written by Ralph Kimball, the father of the most prominent DW modeling scheme, you can go from zero to hero by reading and working through this book. It focuses on the dimensional data warehouse model, which is still highly relevant in my experience. I have seen star schemas front and center in the analytics departments at all the companies I have worked in. Despite various fads, these principles will be important for many years to come.
Spend some time with a reporting tool
This might be Tableau, Domo, PowerBI or Reporting Services (SSRS). Or any of a myriad of other tools. I started with SSRS, and it taught me the basics of building table based reports, as well as throwing together some basic dashboards. One approach is to take the more complex queries you worked on in the Window functions book, and build those into various SSRS reports. I like this approach, as it builds on the knowledge of the queries and the data you have already been over. Or you might talk with some of the business folk in your organization and see if there is some juicy low-hanging fruit you might be able to pick, and create a win for them and you!
The book I used for coming up to speed was for SQL Server 2012, but there is a newer version for 2016. The thing I love about the book is that it is all case-study based. This style of learning works best for me, and hopefully it will be good for you too! The key to making it work is to dig into the examples and then apply them to some data challenge at work.
Learn a programming language
Probably Python. It is just so ubiquitous in the data world it can’t be ignored. I recommend starting with Think Python. It is free and great! It will walk you through not only Python the language in particular, but programming in general. You will get into the mindset of a programmer. For a paid resource that offers WAY more than just Python training, check out DataCamp. So much data goodness to learn on this platform!
Learn an ETL tool
Anything will do to get you learning. For me it was SSIS (SQL Server Integration Services) because my company at that time was a Microsoft shop. My favorite resource for getting going is this book by Brian Knight et al. Also given the importance of the cloud now, if you are SQL Server focused, Azure Data Factory is easy to get going. It is fairly straight forward and yet extremely powerful as you grow your skills. Also there are a ton of great ADF videos on YouTube to get you started in your learning journey.
Make friends with business people in your company.
Especially Marketing and Product folks. If you are in a small company, they can reveal low hanging fruit that you might be able to turn into delicious preserves. I got to know my company’s Product Manager and Marketing manager. I took them to coffee or lunch, and got to know what pain points they faced in their daily jobs. Having been involved in the DBA side for a year or so in that company, I had a good idea of where to look for the data they were missing. From there, it was a matter of some TSQL and a simple SSRS (SQL Server Reporting services) report, and they were over the moon with joy.
Never Stop Learning!