Thursday, July 03, 2008

Log Buffer: #104: a Carnival of the Vanities for DBAs

Today, 4th July, is Independence Day. I know this because Tech Republic has sent me an e-mail of special Independence Day offers. Only not that special, as the list seems to be the same list of offers they mailed for Father's Day. At least that made sense: after all, nothing says "You're the best dad in the world" quite like a gift of the Administrator's Guide to TCP/IP But what sort of patriot celebrates Independence Day by settling down with IT Professional's Guide to Policies and Procedures, Third Ed instead of fireworks, corn dogs and "light tasting" beer? Probably the sort of patriot who reads Log Buffer, so I'd better get on with it.

Staying with the Independence Day theme Curt Monash picks up on a humourous press release from data warehouse appliance vendor Dataupia. It's in the form of a Declaration of Data Independence and is probably funnier if you're American.

In the UK there have been rumours that the government is planning a giant database to track all our telephone and internet activity. On the BCS blog David Evans skips the ethical dimensions and looks at some of the practical considerations. However, the most pertinent point is made by Matthew in the comments: "How many days after the launch of the Big Brother Database ... do you think it will be before someone loses a disk or backup tape full of its contents?"

I'm just an Oracle person, which according to Max Kanat-Alexander means I suffer from Oracle-itis. Apparently symptoms include not being able to recognise the difference between NULL and an empty string, and thinking that one thousand items is a sensible limit for an IN clause. Kevin Closson posted a suitably withering response in his series on things which doth crabby make.

Anyway, doing the Log Buffer has given me - with the assistance of David Edwards and Google - with some exposure to other databases and other ways of doing things. For instance, Leo Hsu and Regina Obe wrote about inheriting tables in PostgreSQL. This is quite a neat idea.
"lets say you developed a timesheet app for an organization and each department insisted on having their own version of the app and each along with the basic fields needed to track some additional ones of their own. Then higher forces came in and said I need to know what everyone is doing, but I don't need to see all that other crap they keep track of.. Two options come to mind - create a bunch of views that union stuff together or institute a round-up-the-children-and-adopt-them program."
In Oracle the only option would be the view (possibly of the materialized kind). Off the top of my head I can't recall a case where I could have used this but it's definitely the sort of capability it's nice to have in your back pocket.

Another intriguing idea which has no parallel in Oracle is the MySQL Sandbox. This is a framework for testing features of different versions of MySQL without jeopardising our primary environment. Its developer, Giuseppe Maxia, The Data Charmer announces that MySQL Sandbox 2.0 has been released.

Regardless of which database you use performance is always an issue. Hubert Lubacewski has a offers a technique for identifying who is is trashing the performance of your PostgreSQL database. Arjen Lentz posts a MySQL script for finding useless indexes. The problems are the same, but the metrics are very different from the ones I'm used to in Oracle: "The query returns all indexes in a db where the cardinality is lower than 30% of the rows, thus making it unlikely that the server will ever use that index." Peter Zaitsev on the MySQL Performance Blog discusses the importance of identifying where the bottlenecks are. There's no point in a web developer tweaking CSS or JavaScript if the real problem lies in the database access layer: "get real numbers for your application before you decide." Ken Downs , the Database Programmer, has some general SQL advice on designing your web application's data model. Mr Oracle Index himself, Richard Foote, gives us his 3 Steps To Performance Tuning.

Transaction management is one of those things which varies considerably from product to product. Many Oracle practioners still think MySQL doesn't have transaction management. This is a canard Pythian's own Keith Murphy lays to rest by writing on transactions in InnoDB. In a related post covering transaction basics says he may write further pieces on "the major storage engines and their transactional characteristics". I presume he means the different MySQL storage engines but I think there's scope for a series which covers all the different database products.

For instance, nested transactions in SQL Server strikes me as asking for trouble. Which is why Kalen Delaney rails against the loss of the Sysprocesses.open_tran column in the SQL Server 2005 metadata.
"Sysprocesses contains a columns called open_tran which reflects the transaction nesting of each session. If a session issues four BEGIN TRAN commands, with no COMMITs or ROLLBACKs, their session will have an open_tran value in sysprocesses of 4. Any open_tran value greater than 0 might mean that a transaction is holding locks and blocking other processes, or it might be keeping the transaction log from being cleared. If you ever notice open_tran values in higher than 2 or 3, it's a pretty good indication that a developer doesn't know much about SQL Server transaction management."
Back to Pythian where Sheeri Cabra reviews MONyog, a GUI monitoring tool for MySQL. Overall she is favourably impressed: "MONyog is the best out-of-the-box GUI monitoring tool for MySQL that I have seen.” Although she does have reservations about its logging. Personally I think the name is a mistake: it sounds too much like something out of H P Lovecraft.

Some Oracle stuff now. Chen Shapira, the not-so-simple DBA puts her Statistics degree to good use by building a custom aggregation function that will return a random salary using Oracle's Data Cartridge extensibility features.
"The main challenge was to make the aggregation truly random....Suppose I have three rows. The way aggregation works, I first take two rows and flip a coin to pick one. Now I have a current value - and I have to take the third row and decide if I want to keep the current value or the new one. I can’t flip the coin again - because if the third row has 50% chance to be selected, this means the first and second rows only have 25% chance each. Not fair. So I need to give the third row 1/3 chance, and the current value 2/3."
On Oracle Base, Tim Hall demontrates the long-overdue support for case-sensitive passwords which Oracle have introduced in 11g.

Oracle has acquired the IKAN tool CWD4ALL and they're going to use it to give SQL Developer a decent modelling support capability. I would have though there would be more excitement about this in the blogosphere (certainly the ODTUG Designer listerserver has been cock-a-hoop) but only Dietmar Aust seems to have picked it up. Perhaps hardly anybody cares about modelling, in which case TOAD's marketshare is safe.

Last week I was judging abstracts for the UKOUG 2008 Conference, and there were three submissions for sessions on best practices in programming with ApEx. Alex Gorbachev (Pythian, again) shows why these talks are necessary with an example of poor SQL taken from the official Oracle documentation.

Lot's of people are asking questions. SQLDenis asks rhetorically Sybase IQ Is A Columnar Database, Why Should I Care?
"What does this mean? This mean that the data is stored in columns and not in rows. Inserts are slower that a traditional row based database but selects are many times faster (up to 50 times). The good thing about this technology is that the SQL looks the same, the only difference is that the data is stored in a different way."
Robert Hodges at The Scale-Out Blog wants to know, what's your favorite database replication feature? Call me shallow, but it's not a topic to which I'm given much thought. I can tell you my five all-time top favourite cover versions instead.

Meanwhile Jon Emmons poses the question Ever wonder what your DBAs really do?. It turns out there's more to the job than drinking coffee, swearing at developers and losing the backup tapes. Who knew?

Of course, DBAs have plenty of reasons to swear at developers. In Extreme Makeover - Database Edition CrazyDBA shows us his scars from a SQL Server version upgrade:
"Saturday morning, migrating from "old prod" to "new prod". We finish up during the afternoon. On Sunday evening (yes, more than 24 hours later), we are notified that the system is not performing properly. We double check things on our end and everything seems to be working, well, except that the duration for some queries have gone from three seconds on "old prod" to twelve minutes on "new prod". Ouch. Our team investigates a bit further and escalates the issue to the (sleeping) onsite team, who pick up their research on Monday morning.

What do we do first on Monday morning? Well, we go to the new test system and run the query. It takes eight minutes. Turns out development is slow as well. Surely someone noticed this during testing, right?"

Er, wrong.

Rick Heiges asks It's Q3 - where is SQL Server 2008? To make him happy (and Mr CrazyDBA even happier), according to Jason Massie there's a rumour that SQL Server 2008 is due to ship next week (or this week if you're reading after the weekend).

From the new releases to some ancient history. Willie Favero comments on an article about DB2's 25th birthday from Information Week. It's interesting to see what counted as a new feature in those days: "You could dynamically add tables or change tables without taking the system down. It doesn't take much imagination now to see this was a huge leap forward," recalled Don Haderle, chief architect of DB2.

Back to the future. Over at the IT Toolbox Lewis Cunningham has his head in the clouds. Or rather Cloud. This is a neat summation of all the main players in Cloud databases. Cloud computing is a rather attractive idea, but I think there is some way to go before it is a practical solution for business. Web access is still far from pervasive or guaranteed, and as the Register pointed out this week, there are still some kinks in the business model to iron out. Lewis describes Amazon as the 800lb gorilla in the cloud space (stratosphere?) and the Register also has a good piece explaining Amazon's interest in the technology.

In his Data Migration blog Johny Morris (no, not that one) invites us to consider the benefits of meetings, in this case Data Quality Rules meetings: "Use them not just instrumentally to solve the issues in front of you but also to build the team that jointly will have uncover all the knowledge hidden in the organisation." Exactly the sort of benefit which will be hard to realise when we are all working in the Cloud and never visit the office.

Finally, nothing to do with databases but I'm sure relevant to us all (at least those who are still office bound), Suzanne Thornberry at Tech Republic writes about the health risks IT professionals run. These include such things as eye strain, bobblehead syndrome and seated immobility thromboembolism (SIT), which is like DVT only worse. So stop reading this and go do something more healthy instead!

Postscript


The Log Buffer is a community activity facilitated by Pythian. Find out more.

4 comments:

Alex Gorbachev said...

Cool Log Buffer. Thanks.
Here is another note about Dataupia (Kevin's blog was mentioned with another post above)/

Anonymous said...

Any date set yet for that top-5 covers blog entry?

Anonymous said...

Great one, Andrew. I agree, I want to see that top-5 covers entry. But be warned, Howard will probably unsubscribe from your blog.

Cheers

APC said...

>> Howard will probably unsubscribe from your blog.

I think I could live with that ;)

cheers, APC