Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Databases SQL Reporting by Example Orientation: Selecting Data Counting Students

Doesn't NANs or blanks in columns change the count?

For example, I answered as follows:

SELECT COUNT(id) AS "Number of Students" FROM students;

Assuming that just because a student's information is present does not mean they are not a prospective student, but all enrolled students would have an enrollment id. I realize two flaws in this logic. 1. Alumni/transferred-out students often retain their original ids so having an id alone does not single-handedly mean the student is an active student at Jefferson Middle School. 2. As the primary key field, these values are likely autogenerated. However, assuming these were not the case, according to this video the count would not change as it only counts the total rows so much so that you can write any hogwash in the count parenthesis and receive the same answer regardless even if the information is missing (as long as there is data in any column of the row at all.

1 Answer

The point of this line is to demonstrate how to use COUNT by counting the total number of rows.

COUNT, like all aggregate functions, ignore NULL values, so if you use a column that contains NULL values, you will not get the total number of rows in the table. In the course's example there are no NULL values at all in the STUDENTS table, so it doesn't matter which column you use. However in a productive database NULL values are often present in some columns. The id column as the primary key never contains NULL values, so it's a good pick. I always use the primary key column to count rows. It's a good practice that prevents mistakes.

In a real database, you would of course restrict it to the students who are active using a WHERE clause, but this is a simple example to demonstrate one function. Obviously you expand on it as needed.

Also, you can always assume a primary key is automatically generated. If the primary key is missing or not unique, your DB breaks. Your application breaks. Everything breaks. As a database designer, you want to be 120% sure that this cannot happen. Of course you can also use COUNT(*) wich just counts all rows regardless of content.