Why I Never Use SELECT * FROM

20 September 2010

I’ve heard for many years that using SELECT * FROM is a bad idea. I’ve always agreed with this in principle, but in practice often haven’t followed this guideline. That’s all about to change, as I was caught out by what (in my opinion) is a bug in Microsoft SQL Server that totally blew my mind.

I’d always thought the only reason why SELECT * FROM was frowned upon was that it returned an unnecessarily large amount of data from the server, and client applications will typically want to consume only a handful of the returned columns. This opinion was changed when I came across the bug I describe in this post.

Consider the following table and view:

use tempdb
go

create table [A](
    [ID] int not null identity(1,1),
    [ColA] nvarchar(100) null,
    [ColB] nvarchar(100) null,
    [ColC] nvarchar(100) null,
    CONSTRAINT [PK_A] PRIMARY KEY ([ID])
)
go

insert into A([ColA], [ColB], [ColC])
values
    ('ColA Val1','ColB Val1','ColC Val1'),
    ('ColA Val2','ColB Val2','ColC Val2'),
    ('ColA Val3','ColB Val3','ColC Val3'),
    ('ColA Val4','ColB Val4','ColC Val4')
go

create view [AView]
as
    SELECT * FROM [A]
go

If we were then to run the View, we get results as expected.

Now, imagine there’s a requirement to add a new column to the table [A], somewhere in the middle of the list of columns above. For example, we want a new column called [NewCol] of type int, with default –1, between [ColB] and [ColC]. We can either make this change in SQL Server Management Studio, or else run the below script (which was generated from within SSMS).

CREATE TABLE [Tmp_A]
    (
    [ID] int NOT NULL IDENTITY (1, 1),
    [ColA] nvarchar(100) NULL,
    [ColB] nvarchar(100) NULL,
    [NewCol] int NOT NULL DEFAULT -1,
    [ColC] nvarchar(100) NULL
    )  ON [PRIMARY]
GO

SET IDENTITY_INSERT [Tmp_A] ON
GO
INSERT INTO [Tmp_A] ([ID], [ColA], [ColB], [ColC])
        SELECT [ID], [ColA], [ColB], [ColC] FROM [A]
GO
SET IDENTITY_INSERT [Tmp_A] OFF
GO
DROP TABLE [A]
GO
EXECUTE sp_rename N'Tmp_A', N'A', 'OBJECT'
GO
ALTER TABLE [A] ADD CONSTRAINT
    [PK_A] PRIMARY KEY CLUSTERED (ID)
GO

Now, if were to execute a select on [AView], we get some rather unexpected results:

ID	ColA	ColB	ColC
1	ColA Val1	ColB Val1	-1
2	ColA Val2	ColB Val2	-1
3	ColA Val3	ColB Val3	-1
4	ColA Val4	ColB Val4	-1

What has happened is that the view outputs the heading columns that were in place at the time the view was created, but the data is taken from the first four columns of the underlying table, which now includes the new column [NewCol] instead of [ColC].

My client application was attempting to read the column [ColC] from the returned dataset and was producing unexpected results, as it was reading data from the underlying column [NewCol].

My expectation would be that the view would either continue to return the four columns it originally did, or else return all five columns of data. Returning the original column headings but with the new data seems odd at best.

What really surprised me is that the above sample causes the problem to occur in all versions of SQL Server from at least SQL Server 2000 right through to 2008.