Friday, November 23, 2007

Multicollinearity

The predictors should not correlate. In the stepwise selection one variable might take the prediction of the another variable into the model and the second variable will not be taken in to the model. The variables might get also inconsistent correlation coefficients (negative if the effect is positive).

If we use wait events, there will in a lot of cases be multicollinearity. In other words, if we see a lot of waits on "db file sequential read", it may be expected that we will also see significant waits on "db file scattered read".

Using v$sysstat makes this even more of a problem. For example, "user I/O wait time" will almost certainly be correlated with "physical reads".

As such, we need to programmatically identify those predictors that are not correlated with each other.

Tuesday, November 20, 2007

C callouts from mySQL

While I was trying to find a way to use the statistical functions built into mySQL for performance measurement, I discovered the ability to create user defined functions that use C callouts from within the database. I knew you could create your own functions using SQL inside the database, but this allows us to expose OS functionality/information if we need it.



This is pretty cool, because we can extend the functionality of mySQL (subject to limitations) without changing the source code.



Below is a very simple example of getting the hostname on which the server runs. This is already this functionality in mySQL (“show variables like ‘%host%’), but as noted this is just a simple example. We may have other ways that we could use this technology, such as sending mail from within the database…



[root@linux3 lib]# cat foo.c
#include
#include
#include
#include
#include
#include
#include
#include

typedef unsigned long long ulonglong;
typedef long long longlong;

#ifdef __cplusplus
extern "C" {
#endif

//init function is required, and a deinit function is optional.
//These can be used to setup for the function call, and then clean
//up afterwards.

my_bool my_machine_name_init(UDF_INIT *initid,
UDF_ARGS *args,
char *message){
return 0;
}

char* my_machine_name(UDF_INIT *initid,
UDF_ARGS *args,
char* result,
unsigned long* length,
char *is_null,
char *error){
gethostname(result,1000);
*length = strlen(result);
return result;
}


We then compile the C code above into a shared library, visible in /usr/lib so mySQL is already aware of it (in the LD_LIBRARY_PATH, but could be anywhere mySQL “knows” about)…


[root@linux3 lib]# gcc -shared -o foo.so foo.c


…and then create a function in the database that references this library.


mysql> drop function my_machine_name;
Query OK, 0 rows affected (0.00 sec)
mysql> create function my_machine_name returns string soname 'foo.so';
Query OK, 0 rows affected (0.00 sec)
mysql> select my_machine_name();
+--------------------------+
| my_machine_name() |
+--------------------------+
| linux3 |
+--------------------------+
1 row in set (0.01 sec)

mysql> exit
Bye
-bash-3.00$


Like I said, what is above is pretty simple, but I am excited about the possibilities. The code has to be thread safe, and well tested, but it does open some doors for us if the need arises.

Saturday, November 10, 2007

A recent question on CDOS was regarding how to have only one space at the end of a CSV output row. Below was my response...

SQL> create table t1110(c varchar2(10))
2 /

Table created.

SQL> insert into t1110 values('1')
2 /

1 row created.

SQL> insert into t1110 values('2 ')
2 /

1 row created.

SQL> insert into t1110 values('3 new ')
2 /

1 row created.

SQL> column len format 9
SQL> column orig format a10
SQL> column newone format a10
SQL> column new_len format 9
SQL> set trims off
SQL> select length(c) len,
2 c orig,
3 regexp_replace(c, '\W+$', ' ') newone,
4 length(regexp_replace(c, '\W+$', ' ')) new_len
5 from t1110
6 /

LEN ORIG NEWONE NEW_LEN
--- ---------- ---------- -------
1 1 1 1
5 2 2 2
7 3 new 3 new 6

SQL>

Friday, November 02, 2007

Control file writes when updating???

I had never seen this before, but while testing a migration strategy for a large table with XMLTYPE's, almost half our wait time was spent on control file sequential read and control file parallel write waits. Each of these occurred immediately after a "direct path write" wait.

When I dug a little further, I found that we had the table set to NOLOGGING. We had not set the LOBS to NOCACHE or NOLOGGING. However, evidently that was trumped, and the unrecoverable changes caused each update to write the corresponding SCN to the control file.

We set an event (10359) that eliminated control file writes for our session, and improved our performance by almost 50% (27 minutes for our test case down to 16)!!!
Locations of visitors to this page