Howto: Enable Concurrent Remote Desktop Sessions in Windows XP SP2

June 13th, 2007

Windows XP, unlike the Server editions of Windows, does not allow multiple simultaneous remote user sessions. This can be a pain. For instance, my wife has a user account on my computer at home which she often uses when I am not at the computer. When I am at work (or where ever) and want to remote into my computer at home, I think I should be able to, even if Vanessa is using the computer. I mean, it is totally possible and built into any good OS worth a damn. I do run Windows XP Pro after-all, the more expensive version of XP for profession users. However Windows XP forces you to boot (log off) the other user before you can log in. This is most likely because Micro$oft wants you to shell out more money for the server OS. Even if you use the server edition of Windows, there are licensing issues with it that prevent you from running certain programs that you are only allowed to run on non-server editions of Windows. Oh for crying out loud! What a stupid pointless catch 22. I guess Micro$oft doesn’t see it that way because it forces you to buy more of their software.

This article outlines how to get around this annoying, draconian limitation imposed by Micro$oft.

Read the rest of this entry »

INSERT … ON DUPLICATE KEY UPDATE

June 6th, 2007

MySQL has some nifty extensions to ANSI SQL. Two of which are “INSERT … ON DUPLICATE KEY UPDATE” and “REPLACE” (As of MySQL 4.1). These statements are used when inserting data into a table where the unique key of the row you are inserting may already exist, and if it does, you want this row to be updated with new data. Instead of doing a select/insert/update routine, this gives a significant performance increase for frequently updated schemas; and what is even more important, the application code becomes simpler, less error prone, and easy to read.

“INSERT … ON DUPLICATE KEY UPDATE” will actually perform either an UPDATE or an INSERT, whichever one is necessary. The “REPLACE” statement will actually delete the row if it exists and insert the new row.

Example of “INSERT … ON DUPLICATE KEY UPDATE” (assuming column ‘a’ is the unique key):

1
2
3
INSERT INTO TABLE (a,b,c)
VALUES (1,2,3)
ON DUPLICATE KEY UPDATE b=2, c=3;

Example of “REPLACE”:

1
2
REPLACE INTO table_a (a,b,c)
VALUES(1,2,3);

Oracle 9i+ also has an extension that can somewhat similarly handle this conditional INSERT-or-UPDATE task: the MERGE statement, though not as intuitive and the code is not as easy to read. Here is an example of how one might implement MERGE for this situation:

1
2
3
4
5
6
7
8
9
10
11
MERGE INTO SCHEMA.table_a TA
USING (
 SELECT '1' a, '2' b, '3' c
 FROM DUAL
 ) E
ON (TA.a = E.a)
WHEN MATCHED THEN
 UPDATE SET TA.b = E.b, TA.c = E.c
WHEN NOT MATCHED THEN
 INSERT (TA.a, TA.b, TA.c)
 VALUES (E.a, E.b, E.c);

Comparing Oracle’s 11 line method to MySQL’s 3 line method, I must hand it to MySQL for its simpler and superior method.

References:
forums.devshed.com, dev.mysql.com, dev.mysql.com, www.mysqlperformanceblog.com, www.oracle.com

More C++ Preprocessor Directives

June 5th, 2007

The #pragma directive is a compiler specific directive which compiler vendors may use for their own purposes. For instance, a #pragma is often used to allow suppression of specific error messages, manage heap and stack debugging, etc.

You can see the C# .Net equivalent here.

In Microsoft Visual C++ 8 (Visual Studio 2005), one useful use for the #pragma directive is to modify compiler warnings. This is from Microsoft:

#pragma warning( warning-specifier : warning-number-list [; warning-specifier : warning-number-list...] )
#pragma warning( push[ ,n ] )
#pragma warning( pop )

The warning-specifier can be one of the following.

Warning-specifier Meaning
1, 2, 3, 4 Apply the given level to the specified warning(s). This also has the effect of turning a specified warning on that is off by default.
default Reset warning behavior to its default value. This also has the effect of turning a specified warning on that is off by default. The warning will be generated at its default, documented, level.See Compiler Warnings That Are Off by Default for more information.
disable Do not issue the specified warning message(s).
error Report the specified warnings as errors.
once Display the specified message(s) only once.
suppress Pushes the current state of the pragma on the stack, disables the specified warning for the next line, and then pops the warning stack, resetting the pragma state. You can only specify one warning for each suppress specifier, but multiple warning pragmas can operate on one line of code.suppress is only supported for C6000 warnings (code analysis warnings), which are enabled with the /analyze (Enterprise Code Analysis) compiler option.

The warning-number-list can contain any warning numbers. Multiple options can be specified in the same pragma directive as follows:

#pragma warning( disable : 4507 34; once : 4385; error : 164 )

This is functionally equivalent to:

// Disable warning messages 4507 and 4034.
#pragma warning( disable : 4507 34 )
 
// Issue warning 4385 only once.
#pragma warning( once : 4385 )
 
// Report warning 4164 as an error.
#pragma warning( error : 164 )

The compiler will add 4000 to any warning number that is between 0 and 999.

The compiler only supports up to 56 #pragma warning statements in a compiland.

For warning numbers in the range of 4700-4999, those associated with code generation, the state of the warning in effect when the compiler encounters the open curly brace of a function will be in effect for the rest of the function. Using the warning pragma inside the function to change the state of a warning greater than 4699 will only take effect after the end of the function. The following example shows the correct placement of warning pragmas to disable, and then restore, a code-generation warning message:

// pragma_warning.cpp
// compile with: /W1
#pragma warning(disable:4700)
void Test() {
   int x;
   int y = x;   // no C4700 here
   #pragma warning(default:4700)   // C4700 enabled after Test ends
}
 
int main() {
   int x;
   int y = x;   // C4700
}

Note that within a function body, the last setting of the warning pragma will be in effect for the entire function.

The warning pragma also supports the following syntax:

#pragma warning( push [ ,n ] )

#pragma warning( pop )

Where n represents a warning level (1 through 4).

The pragma warning( push ) stores the current warning state for all warnings. The pragma warning( push, n) stores the current state for all warnings and sets the global warning level to n.

The pragma warning( pop ) pops the last warning state pushed onto the stack. Any changes made to the warning state between push and pop are undone. Consider this example:

#pragma warning( push )
#pragma warning( disable : 4705 )
#pragma warning( disable : 4706 )
#pragma warning( disable : 4707 )
// Some code
#pragma warning( pop )

At the end of this code, pop restores the state of all warnings (including 4705, 4706, and 4707) to what it was at the beginning of the code.

When you write header files, you can use push and pop to ensure that changes to warning states made by the user do not prevent your headers from compiling properly. Use push at the beginning of the header and pop at the end. Suppose, for example, you have a header that does not compile cleanly at warning level 4. The following code changes the warning level to 3 then restores the original warning level at the end of the header:

#pragma warning( push, 3 )
// Declarations/ definitions
#pragma warning( pop )

See /FI and /w for compiler options that help you suppress warnings.

The top 10 dead (or dying) computer skills

May 25th, 2007

I found this interesting article on computerworld.com. Though I think this is a good overview of the diminishing IT skill scene, as Stewart Padveen says in the article, “Obsolescence is a relative — not absolute — term in the world of technology.”

C++ Preprocessor Variables

May 17th, 2007

I went several years programming in C/C++ without really knowing about these preprocessor variables. They can be very useful for debugging and logging of custom run time errors/warnings. I have not found a definitive list of them, but here is a list of some useful ones that I know of. (The following variables can vary by compiler, but generally work)

__LINE__ Contains the current line number being processed.
__FILE__ Contains the current file name (full path) being processed.
__DATE__ Contains the compile date, This is the date that the file was compiled, not necessarily the current date.
__TIME__ Contains the compile time. This is the time that the file was compiled, not necessarily the current time.
__TIMESTAMP__ Contains the current date and time. This is the date and time that the file was compiled, not necessarily the current date and time.
__FUNCTION__ Contains the function name. (this is part of C99, the new C standard. Not all C++ compilers support it)
__cplusplus Defined when compiling a C++ program. In some older compilers, this is also called c_plusplus.
__STDC__ Defined when compiling a C program, and may also be defined when compiling C++

Instead of explaining these variables, here is an example program that demonstrates how you may use some of them:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include <iostream>
 
#define PUKE_TRACE(X,Y) puke_trace(__FUNCTION__, __FILE__, __LINE__, __threadid(), X, Y)
void puke_trace(const char* func, const char* file, int line,
				unsigned long thread, const char* msg, std::ostream& os)
{
	os << msg << " : Line " << line << ", file " << file
		<< ", function " << func << ", thread " << thread << ".\n";
}
 
int main(int argc, char* argv[])
{
	PUKE_TRACE("Test 1.",std::cout);
	PUKE_TRACE("Test 2.",std::cerr);
	PUKE_TRACE("Test 3.",std::clog);
	return 0;
}

The above program will output something like this:

Test 1. : Line 13, file /path/to/test.cpp, function main, thread 3712.
Test 2. : Line 14, file /path/to/test.cpp, function main, thread 3712.
Test 3. : Line 15, file /path/to/test.cpp, function main, thread 3712.

__threadid() is not a preprocessor variable obviously, but it can be useful when tracing/debugging/logging multi-threaded applications, usually used in conjunction with preprocessor variables.

References:
http://www.codeguru.com/forum/showthread.php?t=231043
http://www.cppreference.com/preprocessor/preprocessor_vars.html

Visual Studio 2005 Web Test Data Binding

April 18th, 2007

Data binding is an important feature in Visual Studio web testing. It allows you to test a web app against random, sequential, or unique iterations of a set of test data. This allows more complete and comprehensive tests as opposed to running the same static test data through the web app on every single iteration of the test.

When running a load test, it may not be suitable to use a remote data source for data binding, especially if the data you need is in a database that gets accessed by the web app you are testing. This is where a local data source may work best, such as CSV file, MS Access, or MS Excel spreadsheet files.

I found it not intuitive at all to setup data binding to an Excel spreadsheet, so I thought I would share what I found from a couple sites. Mainly, this page.

Adding a CSV Data Source

Follow these steps to add a CSV file as a data source

1.) Create a directory to hold your CSV file

2.) Place your CSV file in the directory and make sure the file has a header row

3.) In the Web Test Editor click the Add Data Source button on the toolbar.

4.) Select “Microsoft Jet 4.0 OLE DB Provider” from the OLE DB Provider drop down.

5.) In the server or file name text box, enter the directory that the CSV file exists in. Do not enter the file name; just enter the directory where the file is located.

6) Click the advanced button.

7.) Click on Extended Properties.

8.) Set the value equal to: text

9.) Click Ok to close the advanced editor.

10.) Click Ok to close the connection property dialog.

11.) Click the check box for the files with the data you would like to use for this test case then click Ok.

12.) Now the text file is ready to use a data source.

Adding an Access Database as a Data Source

After setting up the database, perform the following steps to add the data source.

1.) In the Web Test Editor click the Add Data Source button on the toolbar.

2.) Select “Microsoft Jet 4.0 OLE DB Provider” from the OLE DB Provider drop down.

3.) In the server or file name text box, enter the directory and file name of the access database. i.e. c:\temp\databinding.mdb

4.) Click Ok to close the connection property dialog.

5.) Choose the tables you would like to include for data binding then click Ok. Now the access database is ready to use a data source.

Adding a SQL Server Database as a Data Source

After setting up the database, perform the following steps to add the data source.

1.) In the Web Test Editor click the Add Data Source button on the toolbar.

2.) Select “Microsoft OLE DB Provider for SQL Server” from the OLE DB Provider drop down.

3.) In the server or file name text box, enter the database server

4.) Enter the username and password or select the “Use Windows NT Integrated Security” option.

5.) If you entered a username and password check the “Allow saving password” checkbox.

6.) Choose the database name from the dropdown list.

7.) Click Ok to close the editor.

8.) Choose the tables you would like to include for data binding then click Ok. Now the SQL Server database is ready to use a data source.

Adding an Excel Spreadsheet as a Data Source

One excel worksheet can define multiple tables that can be used for data binding. In order to use an Excel spread sheet as a data source, you need to do the following.

1.) Create an excel worksheet.

2.) The first row of your table should be column headers. You can have multiple tables on one work sheet or spread the tables across multiple worksheets. The following steps need to be done for each table which will be used.

3.) Highlight the entire table including the column headers.

4.) On the insert menu point to name and then click define.

5.) Type a name for this selection and click Ok.

6.) This process needs to be done for each table in the workbook that will be used for data binding.

7.) When you are done save the workbook and exit excel.

 

Note: If you add more rows to a table after you define it, you will need to update the definition by doing this process again. Otherwise the new data will not be available for testing.

The next step is to add the data source to the test case. Follow these steps for that process

1.) In the Web Test Editor click the Add Data Source button on the toolbar.

2.) Select “Microsoft Jet 4.0 OLE DB Provider” from the OLE DB Provider drop down.

3.) In the server or file name text box, enter the directory and file name of the excel spreadsheet. i.e. c:\temp\book1.xls

4.) Click the advanced button.

5.) Click on Extended Properties.

6.) Set the value equal to: Excel 8.0

7.) Click Ok to close the advanced editor.

8.) Click Ok to close the connection property dialog.

9.) Choose the tables you would like to include for data binding then click Ok. Note: The worksheet names appear in this list with a $ after them. You can use these for table data. If you try to use these, the test will hang. You need to select the name you gave the table in your worksheet.

10.) Now the excel file is ready to use a data source.

 

Additional Considerations when adding a Data Source

If you are going to be executing a web test on a Controller/Agent setup, you need to consider the location of the data source before creating the connection string. The data for a data source is loaded on each agent machine. So if you create a data source and set the location to c:\test\datadinding.mdb because this is were it is located on the client machine, the agent will also expect the MDB file to be located in the same location on the agent machine. There are 2 options for handling this problem:

1.) Place your data sources on a network share that each agent can access. So when you create the data source, you would use some like \\machine\datasources\datadinding.mdb. Since each agent has access to this location, the connection string will work for each agent.

2.) The other option is to copy the data source to the same location on each agent machine. So if you create your connection string with c:\test\databinding.mdb, then you would need to create a c:\test directory on each agent and copy the databinding.mdb file to the directory.

10 Tips for Optimizing MySQL Queries

April 16th, 2007

About a week or 2 ago, Justin Silverstone posted an article on his blog, “10 tips for optimizing mysql queries“. While not an expert list, the points hold true, however small the points may be. This was posted on digg.com and it was promoted to the front page. A few days later, someone by the name of Jesse posted an article titled, “10 Tips for Optimizing MySQL Queries (That don’t suck)“. These tips are more comprehensive. However, the way in which he/she (Jesse is a coed name, right?) went about corrected Justin was an insulting and demeaning call-out. It seems very juvenile to me, but Jesse did have some very smart things to say based on what appears to be a good deal of experience. These points are worth taking note of, however I do not want the negative part on my site, so I am cutting it out and leaving only the smart stuff Jesse had to say on the subject:

  1. Benchmark, benchmark, benchmark!

    You’re going to need numbers if you want to make a good decision. What queries are the worst? Where are the bottlenecks? Under what circumstances am I generating bad queries? Benchmarking is will let you simulate high-stress situations and, with the aid of profiling tools, expose the cracks in your database configuration. Tools of the trade include supersmack, ab, and SysBench. These tools either hit your database directly (e.g., supersmack) or simulate web traffic (e.g., ab).

  2. Profile, profile, profile!

    So, you’re able to generate high-stress situations, but now you need to find the cracks. This is what profiling is for. Profiling enables you to find the bottlenecks in your configuration, whether they be in memory, CPU, network, disk I/O, or, what is more likely, some combination of all of them.

    The very first thing you should do is turn on the MySQL slow query log and install mtop. This will give you access to information about the absolute worst offenders. Have a ten-second query ruining your web application? These guys will show you the query right off.

    After you’ve identified the slow queries you should learn about the MySQL internal tools, like EXPLAIN, SHOW STATUS, and SHOW PROCESSLIST. These will tell you what resources are being spent where, and what side effects your queries are having, e.g., whether your heinous triple-join subselect query is sorting in memory or on disk. Of course, you should also be using your usual array of command-line profiling tools like top, procinfo, vmstat, etc. to get more general system performance information.

  3. Tighten Up Your Schema

    Before you even start writing queries you have to design a schema. Remember that the memory requirements for a table are going to be around #entries * size of a row. Unless you expect every person on the planet to register 2.8 trillion times on your website you do not in fact need to make your user_id column a BIGINT. Likewise, if a text field will always be a fixed length (e.g., a US zipcode, which always has a canonical representation of the form “XXXXX-XXXX”) then a VARCHAR declaration just adds a superfluous byte for every row.

    Some people poo-poo database normalization, saying it produces unecessarily complex schema. However, proper normalization results in a minimization of redundant data. Fundamentally that means a smaller overall footprint at the cost of performance — the usual performance/memory tradeoff found everywhere in computer science. The best approach, IMO, is to normalize first and denormalize where performance demands it. Your schema will be more logical and you won’t be optimizing prematurely.

  4. Partition Your Tables

    Often you have a table in which only a few columns are accessed frequently. On a blog, for example, one might display entry titles in many places (e.g., a list of recent posts) but only ever display teasers or the full post bodies once on a given page. Horizontal vertical partitioning helps:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    CREATE TABLE posts (
        id int UNSIGNED NOT NULL AUTO_INCREMENT,
        author_id int UNSIGNED NOT NULL,
        title varchar(128),
        created timestamp NOT NULL,
        PRIMARY KEY(id)
    );CREATE TABLE posts_data (
        post_id int UNSIGNED NOT NULL,
        teaser text,
        body text,
        PRIMARY KEY(post_id)
    );

    The above represents a situation where one is optimizing for reading. Frequently accessed data is kept in one table while infrequently accessed data is kept in another. Since the data is now partitioned the infrequently access data takes up less memory. You can also optimize for writing: frequently changed data can be kept in one table, while infrequently changed data can be kept in another. This allows more efficient caching since MySQL no longer needs to expire the cache for data which probably hasn’t changed.

  5. Don’t Overuse Artificial Primary Keys

    Artificial primary keys are nice because they can make the schema less volatile. If we stored geography information in the US based on zip code, say, and the zip code system suddenly changed we’d be in a bit of trouble. On the other hand, many times there are perfectly fine natural keys. One example would be a join table for many-to-many relationships. What not to do:

    1
    2
    3
    4
    5
    6
    7
    
    CREATE TABLE posts_tags (
        relation_id int UNSIGNED NOT NULL AUTO_INCREMENT,
        post_id int UNSIGNED NOT NULL,
        tag_id int UNSIGNED NOT NULL,
        PRIMARY KEY(relation_id),
        UNIQUE INDEX(post_id, tag_id)
    );

    Not only is the artificial key entirely redundant given the column constraints, but the number of post-tag relations are now limited by the system-size of an integer. Instead one should do:

    1
    2
    3
    4
    5
    
    CREATE TABLE posts_tags (
        post_id int UNSIGNED NOT NULL,
        tag_id int UNSIGNED NOT NULL,
        PRIMARY KEY(post_id, tag_id)
    );
  6. Learn Your Indices

    Often your choice of indices will make or break your database. For those who haven’t progressed this far in their database studies, an index is a sort of hash. If we issue the query SELECT * FROM users WHERE last_name = ‘Goldstein’ and last_name has no index then your DBMS must scan every row of the table and compare it to the string ‘Goldstein.’ An index is usually a B-tree (though there are other options) which speeds up this comparison considerably.

    You should probably create indices for any field on which you are selecting, grouping, ordering, or joining. Obviously each index requires space proportional to the number of rows in your table, so too many indices winds up taking more memory. You also incur a performance hit on write operations, since every write now requires that the corresponding index be updated. There is a balance point which you can uncover by profiling your code. This varies from system to system and implementation to implementation.

  7. SQL is Not C

    C is the canonical procedural programming language and the greatest pitfall for a programmer looking to show off his database-fu is that he fails to realize that SQL is not procedural (nor is it functional or object-oriented, for that matter). Rather than thinking in terms of data and operations on data one must think of sets of data and relationships among those sets. This usually crops up with the improper use of a subquery:

    1
    2
    3
    4
    5
    6
    
    SELECT a.id,
        (SELECT MAX(created)
        FROM posts
        WHERE author_id = a.id)
    AS latest_post
    FROM authors a

    Since this subquery is correlated, i.e., references a table in the outer query, one should convert the subquery to a join.

    1
    2
    3
    4
    5
    
    SELECT a.id, MAX(p.created) AS latest_post
    FROM authors a
    INNER JOIN posts p
        ON (a.id = p.author_id)
    GROUP BY a.id
  8. Understand your engines

    MySQL has two primary storange engines: MyISAM and InnoDB. Each has its own performance characteristics and considerations. In the broadest sense MyISAM is good for read-heavy data and InnoDB is good for write-heavy data, though there are cases where the opposite is true. The biggest gotcha is how the two differ with respect to the COUNT function.

    MyISAM keeps an internal cache of table meta-data like the number of rows. This means that, generally, COUNT(*) incurs no additional cost for a well-structured query. InnoDB, however, has no such cache. For a concrete example, let’s say we’re trying to paginate a query. If you have a query SELECT * FROM users LIMIT 5,10, let’s say, running SELECT COUNT(*) FROM users LIMIT 5,10 is essentially free with MyISAM but takes the same amount of time as the first query with InnoDB. MySQL has a SQL_CALC_FOUND_ROWS option which tells InnoDB to calculate the number of rows as it runs the query, which can then be retreived by executing SELECT FOUND_ROWS(). This is very MySQL-specific, but can be necessary in certain situations, particularly if you use InnoDB for its other features (e.g., row-level locking, stored procedures, etc.).

  9. MySQL specific shortcuts

    MySQL provides many extentions to SQL which help performance in many common use scenarios. Among these are INSERT … SELECT, INSERT … ON DUPLICATE KEY UPDATE, and REPLACE.

    I rarely hesitate to use the above since they are so convenient and provide real performance benefits in many situations. MySQL has other keywords which are more dangerous, however, and should be used sparingly. These include INSERT DELAYED, which tells MySQL that it is not important to insert the data immediately (say, e.g., in a logging situation). The problem with this is that under high load situations the insert might be delayed indefinitely, causing the insert queue to baloon. You can also give MySQL index hints about which indices to use. MySQL gets it right most of the time and when it doesn’t it is usually because of a bad scheme or poorly written query.

  10. And one for the road…

    Last, but not least, read Peter Zaitsev’s MySQL Performance Blog if you’re into the nitty-gritty of MySQL performance. He covers many of the finer aspects of database administration and performance.