As any data analysts will tell you, "Exploring your dataset with SQL is often one of the first steps that you take to uncover those hidden insights." What does this query actually doing? Can you take a guess? Here, we are counting the total number of trips taken, on a public dataset, for San Francisco is bikeshare trips. Now let's explore this public dataset together in greater detail with a quick demonstration. It's now time to dust off your SQL or structured query language skills as we explore some public datasets inside of Google BigQuery. Let's dive right in. In this demo, how I got to this public BigQuery dataset, which is what we were looking at before to San Francisco, bikeshare, bikeshare trips, is from this publicly available dataset, BigQuery public data. Now have you might not have that inside of your Google app account or your own personal accounts. How you can get those datasets, by clicking "Add Data", "Explorer public datasets" Choosing one of those and it brings you right back into the console. I'm going to show you a couple of nifty things that I've picked up inside of BigQuery over the years. It's now you have a ton of BigQuery public data sets that you can experiment with. Once you find one you like. You can do a lot, just knowing the name of the table. We're not going to write any SQL yet. We got the table name. Notice if it has hyphens in it, so we named the BigQuery project with the BigQuery public data with hyphens. That's when you need those backticks. Now, technically you only need them around the project name, but that's when you see those backticks, which is that character there come into play. If you don't have hyphens anywhere in here, you actually don't need those. That's the first tip for you. Second tip, one of my absolute favorites. If you hold down on this query, it's going to be the command key on a Windows key, it'll highlight all the datasets in your query. Why is that useful? Because as soon as it's highlighted, you can click on it. If you have a bunch of different datasets, that you just want to explore or you inherited a query from somebody else, you can quickly get to the schema, the details in the preview of that particular table. We're trying to get as far, without writing any keystrokes in here. I'll take a look at the schema. This is San Francisco bikeshare trips. You've got things like the trip ID, everything we could possibly want. The duration in seconds when it started, what stations are and this is let your mind wonder for some of the cool insights that you could take a look at. Geographic data, we've got start latitude longitude and latitude longitude and maybe some of those core GIS functions. Let's take a look at how many rows we have. We don't have to do count star or anything like that. We have almost 2 million rows in about 375 megabytes in there. The best practice, you don't want to do select count star from a table limit 10 when just looking at a preview would suffice. You saw all the schema and the datatype. Now you can actually go in here, and look at all the total rows and see sample data values. You can see the station, where they stopped and started everything. They'll get you a familiarity with it. If you wanted to query the table, you can click on "Query Table" It will ask you, Hey, I'm going to override everything in the Query Editor. Sure, that's fine. It's going to say select some columns from our bikeshare trips. Use a neat tip. I just want to just figured out recently and props to the BigQuery team. If I just wanted to say, the total number of trips that started from the station name. I'm just clicking on the field names, especially if you have field names that are really long or if you don't want to type things in like I do in a live demo, then you can just click on these field names and I'll add them. It'll even add the comma is for you, which is cool. Then what you can do inside of BigQuery format the query. Again, even if you experienced the SQL this is still fun to try to do as much as you can without even typing anything in into the editor. I'm going to run that query. It doesn't really do much for us, because it's just going to give the station name. It's still at the granularity level of the individual trip. What's one of the things that you can do inside of BigQuery? If you're looking at that trip ID, we can say, I want an aggregate function, like a count, of all trips. Now, it doesn't match just counting the rows. You could do count star or some people like to put count one or something like that. For readability, I'm going to keep the account of trip ID. Just so somebody else inheriting my code can very quickly see the level of granularity of this table and we'll call this the number of trips. Now naturally, immediately you're gonna get an error. If you work with SQL long enough as soon as you do an aggregation in one field, all the rest of your other fields better be aggregated as well. But, if it's just been a late night you want to just open up the validator and you can see, "Hey, this references, this start station name, the other person here is aggregated, but this one isn't." Then you'd naturally what you want to do, you want to make sure that you do the group bio. That's how we can say, "All right, well, we're rolling up all the trips into a single value. Let's group those by each of the different stations." Now if you remember your SQL and what's one of the things that you can do to get the most? First, you can do an order by which is going to be a sort. You can order by the alias as we see here. Now that you actually can't filter in a where clause by an alias field, because it actually doesn't exist when the query engine goes out and performs that where clause, keep that in mind. That's where you can use things like temporary tables. Orders by the highest number of trips first, what does it say, the top ten stations or something like that and make sure we'll format this. If you want to be a really in everyone's good favor, you can add a comment at the top. This is like the top 10 stations by volume. Go ahead and run that, and we'll see you in the most popular stations. We've got the San Francisco Caltrain, and I can definitely vouch for this one. As soon as you get off the train, you need to get somewhere in San Francisco, and it's got 72,000 trips. Now if you want to experiment a little bit more, what you can do is you can add in a filter and just say, hey, I'm looking for just those trips for 2018. There's another field we can go back. Honestly, I've already forgotten the field name. I hold down that button, and then I'm going to pop in the start date. Let's take a look at that start date. Start date is a timestamp. Let's see, where the start date is say after, what did we say? 2018? We can do it in 2017, end of 2017, 12-31 sure, after all this, you can convert it if you wanted to. Hopefully, this will just take automatically. There's a lot of date functions and extraction that you can do but let's see, I think it wasn't before like 20,000 or something like that, 70,000 I think for the Caltrain. Let's see if the Caltrain is still number 1. Oh, look, in the last year Caltrain is actually dethroned. It's the Ferry building, super popular tourist spot if you haven't been to San Francisco yet, and that takes the highest number of trips just for 2018. If you're doing a lot of this aggregation inside of a Big Query and then filtering, it's fun for these types of insights. But an easier way that you can do it a lot of times is by actually just exporting the data, or linking it directly from a front-end visualization tool like Data Studio. You can actually say, I don't want actually limit the data here. I want to throw all the data visualization tool, and then you can just have a filter for the users that actually has the results. If I wanted to store this common query, I could do a couple of different things. I could save the query as my query inside of my project, personally, just for me or for all people that can see it. Or I can save it as a view, but what I'd like to do since I commonly commit all my code virtual control is I actually do a little bit of data definition language in SQL, SQL DDL. I want a creation statement inside of the actual code itself so people if they wonder, hey, where on earth did you get this top 2018, and actually, it's not just 2018 is anything after 2017 as well for people yelling into the screens like, hey, it's not 2018, I got you. Create or replace table. Let's just give it a table name. We need a dataset. Do we even have a dataset? I don't think so. We need to have a dataset first before just dumping things in here. We call this bike insights, something like that. Dataset is just a collection. Now we have an empty dataset. There's no tables in there, so you can start populating one. We have bike insights, so create or replace table in the dataset, and we'll just call this top trips 2018 and beyond. Try to keep your table names a little bit off more sync than mine. Then, once you've created that, you should automatically see this show up. Boom, there you have it, and now you can actually have that. The query is not going to rearrange every single time. Now, you might be asking what happens if the BigQuery public data source updates after this? That's an excellent point. How would your table, this is just dumped all of this data here, it's going to be static now. One of the things that you can do is if you're looking to link this out to your dashboard, instead of creating a table, you could simply create a view, which a logical view, what that means is every single time, and we'll just call this view, since it says, hey, that object already exists. Every single time you can notice the icon changes, and you can actually click into here, and you can look inside of the table. You can actually preview the data inside of the view, that preview is gone. Why is that? Because the view is just an empty object. A view is essentially a logical view, in this instance of SQL is just a stored query. If you're trying to query the view, we can query the view, and then that actually just runs against the query that we had stored a little bit earlier. It's high-level recap. You've seen datasets, you've seen public datasets, you can explore them at your leisure. Some of the new tips and tricks that you've seen. If you wanted to actually then go in and edit that view later, you can scroll down, bring that result back into here, look back at that schema, and keep exploring as you see fit. This is just the tip of the iceberg when it comes to data analysis, we just did a simple count of trips by the most popular station names. There's also data sets in there for weather, so you could do a final analysis to see. We have a bikeshare dataset for New York or San Francisco. See where weather affects the riders the most, or how the bike share program is doing as a whole when it comes to seasonality. Is there a lot of more or less ridership in some of the winter months? You can check that all for yourself, but you can do it all with SQL and not have to worry about building any of the infrastructure behind the scenes. All that's managed for you inside of BigQuery. All right, that's a wrap.